"All models are wrong, but some are useful."
So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.
The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.
At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.
It strikes stag staff that Wired writer Chris Anderson (of Long Tail fame) is right to imply that this has profound implications for the human condition. As acclaimed Economist Tyler Cowen (someone who makes a living off of the interplay between correlation and causation) points out, the ascription of meaning and the search for "A Reason" and the drive to connect dots with narrative are hallmarks of the human condition:
I think of a few major problems when we think too much in terms of narrative. First, narratives tend to be too simple. The point of a narrative is to strip it away, not just into 18 minutes, but most narratives you could present in a sentence or two. So when you strip away detail, you tend to tell stories in terms of good vs. evil, whether it's a story about your own life or a story about politics. Now, some things actually are good vs. evil. We all know this, right? But I think, as a general rule, we're too inclined to tell the good vs. evil story.As a simple rule of thumb, just imagine every time you're telling a good vs. evil story, you're basically lowering your IQ by ten points or more.
With these invariants in mind, though, this sort of unthinking techno-futuristist bravado devolves into little more than nonsense:
Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?
Perhaps. Considering a non-trivial number of stag staff are so called "quants," we would be loathe to discard the virtues of applied mathematics. But let us remember that technological progress is decidedly not normativelly neutral:
Technology is a tool that does what we make with it. Left with more questions than answers, stag staff can only wonder: what would our dear boy Hume have to say about all of this?