Approximately causal
In 2008, Chris Anderson, the editor of Wired at the time, published a brief article that caused a large stir. Titled “The End of Theory,” it argued that big data and advanced statistics were rendering theories and hypotheses obsolete. If we can use these new tools to establish correlations, then why do we need theories and hypotheses to explain them?
The article was undoubtedly meant as a provocation and, in that regard, it was quite successful. Yet, most of his examples were not controversial. For example, early in the article he pointed to Google searches. Google doesn’t need to know why its algorithms are placing this or that page first or 5,000th in its listings. Its search algorithms may be considering thousands of different parameters, but as long as the system works—satisfies its users' needs—then that's good enough. But, Anderson said, the same is happening in physics, biology, and other scientific fields, and if the correlations work, we don't need hypotheses that explain them. "Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all" he wrote.
Causal models and machine learning
The pushback against this was strong, of course. Some responded that science exists to explain things, and explaining requires having hypotheses and models that hold generally and can be applied to a particular situation—or that what makes a correlation real and not spurious is that it comes about through causal relationships. Otherwise, how do we differentiate the correlation between stubbed toes and toe pain from the correlation between the number of films that the actor Nicolas Cage makes in a year and the number of people who drown in swimming pools?
Now, I don’t dispute the value of introducing causal models into machine learning that Judea Pearl argues for in The Book of Why. But I also wouldn’t want to limit machine learning to what can be explained causally. That’s not because I think there are exceptions to causality. You don’t have to deny causality to promote a statistical view rather than purely causal accounts. We just have to acknowledge that even simple causal examples are necessarily beyond our ability to fully comprehend.
The success machine learning is having by making operational so many premises of chaos theory is letting us get more comfortable with that thought. For example, where every scrap of confetti lands during the Thanksgiving Day Parade in New York City is rigorously determined by causal factors. But we cannot determine ahead of time where any one piece will fall. Any particular piece of confetti’s fall is determined by the summation of many small, unpredictable causes that we can’t in any practical sense monitor. They could include the exhalation of nearby children, the static electric attraction of the coats that two parade-watchers happen to be wearing, the micro draft caused by a newspaper box planted next to a revolving door, the motion of the air stirred by a cat ducking out from under a majorette’s foot—not to mention the gravitational tug of every other piece of confetti and every star in the universe.
Each of these things may have affected precisely where a scrap of confetti falls. Each is causally determined. But they mean the universe is unpredictable at its core for even larger events— the fall of a coin from the Leaning Tower of Pisa—that are also affected by every small cause near and far. We only think we can predict the coin’s fall better than the piece of confetti because beyond a millimeter or so we don’t care about how precise our coin prediction is.