Privacy is a wonderful and complex thing. To my mind, it should operate on a sliding scale under the individual’s control: total privacy for those who want to research information for themselves or communicate in confidence with others, through partial privacy for those willing to exchange personal data for convenient services, down to zero privacy for those who want to strut their stuff in public.
The partial or total surrender of privacy is familiar to us through our transactions with the likes of Google(s goog) and our use of platforms such as Twitter. That’s fine, as long as the individual chooses to surrender their personal data. But I’d like to dwell for a moment on the concept of total privacy, and why it should be an option even in the online age.
Privacy means different things in different cultures, and the western understanding of privacy is largely a…
View original post 1,069 more words
Why do some news items go viral and others don’t? Why are people willing to share certain stories with their friends instead of others? While we may be led to believe that interesting products are key to such viral spread, nothing could be farther from the truth. Enter Big Data analysis…
Analyses of data by Jonah Berger (Marketing professor at the Wharton School and author of the New York Times bestseller Contagious: Why Things Catch On) and his colleague shed more light on why things go viral. They investigated word-of-mouth data for over 10,000 products and brands and over 7,000 pieces of online content and here’s what they came up with…
1. Big Data analyses have revealed that there’s actually a science behind “virality”. However, he is quick to point out that sticking with the deductions made from their analyses won’t guarantee a home run all the time — Just…
View original post 49 more words
Around the world, the health care system is rife with inefficiencies, and General Electric (s ge) thinks it can help solve the problem using data. Only it’s not talking about bureaucrats looking at reports: GE has built an artificial intelligence system called Corvix that uses historical data to predict the future, including everything from how diseases will spread to the cities where hospitals will be needed the most.
It might sound futuristic, but the techniques behind Corvix have actually been around for a while. The platform uses agent-based modeling to build, essentially, a reasonable facsimile of some sort of complex system and then simulate its evolution over time. The “agents” represent the atomic units of those systems, such as individual people in the case of human populations or perhaps cells in the case of a biological simulation. They act according to a set of rules in any given situation, which…
View original post 817 more words
Una de las primeras lecciones aprendidas del Mooc de Periodismo de Datos, dictado por Sandra Crucianelli en el Knight Center, es la extracción de datos en archivos cerrados.
La maestra argentina nos enseñó a los participantes varias herramientas gratuitas de las que disponemos en la web los periodistas interesados en trabajar con datos atrapados en formato PDF. Algunas tienen más funcionalidades que otras, pero a ensayo y error se encontrarán las que más se ajusten a nuestras necesidades de trabajo. Por ejemplo, algunas ofrecen la opción de sólo extraer los datos de la primera página o de archivos con determinado peso.
View original post 227 more words
Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function.
In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to calculate the aggregation by groups. For the comparison purpose, parLapply() function in snow package and mclapply() in multicore package are also used to generate the identical aggregated data.
In order to illustrate the CPU usage, multiple screenshots have also been taken to show the difference between parallelism and single-thread.
In the first screenshot, it is shown that only 1 out of 8 CPUs is used…
View original post 56 more words