The Errors of Big Data
We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level.
Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information.
In other words: Big data may mean more information, but it also means more false information…
big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.
We used to have protections in place for this kind of thing, but big data makes spurious claims even…
View original post 98 more words