on Math, Art and Prose.

Statistics: A love story

Remember that feeling? That feeling you get when you have something stuck between your teeth. You claw at it; you use a toothpick which breaks. Useless efforts. Annoying. You hope it goes away. It does eventually; the world feels right again. You don’t even notice the moment it disappeared. Yet, those minutes, hours were awful.

That was physics for me. It was mainly the experiments, we read these big, important books about fancy theories. We then spent a few hours every week running experiments. Experiments measuring gravity or some other bullshit. The big books said that free fall acceleration, gravity ($g$) was $9.8$ $m/s^2$. That wasn’t the case in those experiments. I thought it was because of the way we did things. You were given shitty equipment. You repeated each experiment ten times, you wrote down the numbers, you found the average. You hoped the value was close to the book value. It never was; that worried me.

It also was an academic problem. Mainly because of her. I still remember her. She was obese, violent with a temper – the physics lab teacher. She would pick up my lab notebook, circle a few numbers with that angry red pen. She would always fling the notebook on the floor. Sometimes, she would call me a “duffer”; she would point at the “good kids”. They got the right values. She might even cuff me on the head. Then, I would have to go repeat those experiments, the “bad ones”.

I rarely repeated the experiments. After a while, I rarely even did them the first time around. I made up numbers. Numbers such that the mean was close to the “true value”. We were both wrong. I believed in a magical world where amazing tools existed to get perfect results. She believed that some results were bad and could be thrown away. We both believed that not getting a correct value for $g$ was a bad thing. It is okay, though. We were repeating a cycle. A cycle of mistakes that smarter people had made, for centuries, far away from that cargo-cult physics class in a high school in India.

Until today, I didn’t realize that awful feeling had left me. Left me long ago. I am reading a very odd book. The Lady Tasting Tea explains statistics for the layman; yet it addresses deep philosophical issues about the nature of science. Things that trigger the old memory.

… all experiments are sloppy and that very seldom does even the most careful scientist get their number right.

… we do not look upon experimental results as carefully measured numbers in their own right. Instead, they are examples of a scatter of numbers, a distribution of numbers can be written as a mathematical formula that tells us an observed number will be a given value.

In Bridget Jones’s Diary, Bridget Jones weighs herself every day. She records this in her diary. Depending on the day, she is either ecstatic or devastated. The book meanders on and on about this. Unfortunately for her, she is looking at random reflections of her true weight. Even if she weighed herself hundred times a day, all she would get would be a sample of a probability distribution. A mathematical abstraction that has hidden knowledge about her weight. This is like being given a few solved chunks of a big jigsaw puzzle. She doesn’t have the full puzzle. She never will. All she can do is make a good guess at the big picture. This is where Statistics helps. Sometimes, I can find out which jigsaw puzzle I am working with. Every probability distribution has clues that identify it. The mean tells me the center of this distribution; the central number around which all of Bridget’s weights spread out. The variance tells me how far the numbers vary from this center of the distribution. Is it possible that Bridget will wake up someday and see 25 pounds? 300 pounds ?

On Random Experiments, Polio and Paleo

In Ham on Rye, Bukowski talks about his acne. A teenager with acne has a terrible fate. This was depression era America. Bukowski was poor. He went to a hospital, I imagine one for poor people. The doctors suggested the electric needle. A hot needle was used to drill his boils, squeeze them. This was done for weeks. It was useless. He was left scarred.

They experimented on the poor and if that worked they used the treatment on the rich. And if it didn’t work, there would still be more poor left over to experiment upon.

Bukowski is an ugly man. Scarred in his photos. The story could be real. I don’t know. However, humans are powerfully susceptible to anecdotes. Suppose the story was true. Does the fact that the treatment failed mean it wouldn’t work for the chosen few, the rich? Suppose it had succeeded, would it still work for rich people?

I don’t know. Maybe he didn’t have acne. Maybe he got sick from the cans of free hash they got. Maybe he lived in a polluted neighborhood. Maybe Bukowski, as he said it, was just selected to have Acne. I don’t know. Anatole France was however right.

To die for an idea is to set a rather high price on conjecture.

Sometimes, conjecture is all we have. How do we reduce the number that die then? It is hard. There is an art to it. An art of making numerical conjectures. Statistics. According to Freedman, anyway.

Say I have a bag of marbles. They are identical. Blue, round, made in the same sweatshop. I sell them in a shop. An angry customer comes by. She was in a ringer competition in Alaska. Her marble cracked; due to the cold, she says. Her life is ruined. She wants to sue. One way to find out for sure what happened is to run an experiment. I could take a marble, fly to Alaska, see if it cracks. What if I cheat? Choose a good marble. We decide the marble will be chosen randomly. Since I may keep the marble in a marble cozy, someone else who doesn’t know about the experiment will take it to Alaska. What if the marble does crack? Does that mean it was the cold? Some marbles could just crack. Maybe a bad batch from the sweatshop. Maybe some of the marbles collided when the truck carrying them ran over a pothole. So we pick some marbles at random. We flip a coin; separate the marbles into two groups. Now, the probability that a marble cracks due to some other reason should be the same in both groups. We fly one to Alaska; the other we keep here. A randomized controlled experiment.

That was hard. It gets harder, especially when human lives are involved. Twenty years after Bukowski’s tryst with Acne, Polio, the disease that left Roosevelt crippled had a possible vaccine. Lab experiments looked optimistic. Again, another conjecture was needed. One way of testing this conclusively is to innoculate everyone. Compare against a previous year. Historical controls, they are called. The problem was that this could be a good year or a really bad year for Polio. Can we compare children whose parents give vaccination permission against those who did not? Not quite. The poor are more healthy. The rich are typically educated, more willing to be vaccinated. An ideal experiment is one which picks children whose parents allow them to be vaccinated. Those children are split into two groups. One is vaccinated. Other is given a placebo. This was what was finally done. A controversial experiment with tough ethical issues to solve.

What of experiments where the researcher can’t play god? After all, not many are willing to engage in a dangerous activity long term just to make a statistician happy. So we run observational studies. Compare groups that are as identical as possible. Male smokers in the age of 40-50 against male non-smokers in the same age group. It is not easy. It is not perfect.

It also explains why nutrition today is such a clusterfuck. My friends are into paleo. Everyone tells me anecdotes about hunter-gatherer civilizations and how they were healthy. It is not clear how long they lived. Whether their behavioral patterns were identical to the modern day hipster. Another danger is to interpret too much from animal trials. Our society is fascinated by the lives of mice and fruitflies. Probably more than cats. Today’s BBC news article is relevant. They talked about the long lives of male mice that were on diabetes pills. No clinical studies on humans, yet. Just like a pack of cigarettes with a warning hidden somewhere, they said at the very end:

it is unclear what the study might mean for human health.

Yet, the article was in the health section and not in the wildlife section.

I wrote this in the UCSF medical library, Parnassus. The medical nature of this article was probably not causal.

The map is not the territory

Alfred Korzybski said that. Reality is not the model. The model however can change. It can be changed. I realized that very recently. It is a powerful idea. It directs responsibility inwards, completely. I am responsible for my emotions, my reactions to the external world.

We collaborate through models. Your model interacting with mine. In relationships, art, cooperative learning etc. We see this in art specially. Eric Kandel talks about the share of the beholder. The beholder’s brain acts as a ‘creativity machine’. Each such interaction is different. This makes sense to me. My neuroscience professor talked about this in visual perception. The data stream fed into the eyes is mostly stripped. Some hints go to the brain. It reconstructs it. It is an amazing idea. We all have a collaborative social agreement on symbols to describe ‘red’. However, do we all perceive red the same way?

Some models are more powerful, I feel. Prose is more effective at bringing magical worlds across time and space. Movies, not so much.

This is true in creating using computers. I spent three days learning SASS, Jekyll and Octopress to build this blog. I am not satisfied with the end result. Yet, this is one of the best of the worst ways of doing this. Bret Victor talks about programming being a ‘blindly manipulative task’. I agree. It is apparent when I want to change colors in this blog. Ideally, I could get a crayon, paint maybe. Paint on the mockup. The mockup would be turned into code. I could deploy the code. Instead, I pick a color from a color picker. Change the color in SASS. Regenerate the blog. See how it looks. Redo. Another analogy? Fonts. I can write in my handwriting. Unique to me. Yet online, I pick a font that is closest to what I want. I put the font in my blog. I see how it looks. I repeat. The model is not completely bad. It lets me connect with someone across the world. However, it is not good either. A newsletter using a pen, would be more easy to use.

Bret has another analogy that applies to data science. When you want to visualize data, say in R or Matlab. You put your data in it. The program builds a bar chart or a pie chart or something from a template. Yet, when you write a research paper, you don’t pick a template and fill in the blanks.

This is not to say that only old models are good. Mathematics is a great example. I don’t like symbols. I don’t find numbers that interesting. I hate silly problems about trains moving back and forth. Why is the mechanics of moving locomotives supposed to be relevant to my life? Neither do I particularly enjoy puzzles. Yet, I am deeply in love with math. You think Lolita works on several different levels? Statistics is incredibly rich. A Gaussian probability model underpins so many different narratives: Bayesian, frequentist, even probability theory. These are not immediately apparent, however. The symbols, even some of the drawings are limited. Ceci n’est pas une pipe; indeed.

I was lucky. I found someone who stripped away symbols, shared with me their way of perceiving math. There needs to be a more systematic way of doing this. How? I don’t know yet. I want to find out. I want to build better maps. I want to use old maps more effectively. So I meditate on Math, Art and Prose.

Crafted by esh © 2016