Archive | April 2013

Predicting Twitter popularity is all about probability


Tweets have the power to decimate markets, but they also have users and companies seeing dollar signs. With huge marketing, political, and social mobilization potential, how can you predict which tweets will get more views, and which retweets will go viral? A new study developed a statistical model that attempts to estimate the popularity of tweets, and thus how memes spread.

Starting with 52 “root” tweets from users both famous and obscure, the researchers first analyzed the dynamics of retweeting, like the speed and spread of a tweet from a user to followers and then their followers. The researchers, from the University of Washington, MIT, and Penn, used the Twitter API to collect all the retweet information and found that most retweets occurred within one hour of the original tweet. Not surprisingly, they also found that root tweets are retweeted more than the retweets themselves.

They then plugged the important…

View original post 275 more words


Visualization startup Datahero opens its doors and delivers data analysis for the masses


When I first met Datahero Co-founder Chris Neumann a year ago, I was pretty excited about what he claimed his new company was going to do. Essentially, he told me, it was going to offer a simple, cloud-based data analysis and visualization service that anyone could use. About a month later, in late May, I got a demo of a very-early-stage Datahero and was impressed with the vision. On Tuesday, the company is officially opening its service to a public beta, and the more-finished product still strikes the right chord.

Before evaluating Datahero, though, it’s important to know what it’s not. Namely: it’s not enterprise software, it’s not even business intelligence software and it’s not designed for people who hope to run complex analyses. Neumann nicely summed up what Datahero is during a recent call: “We’re gonna make it usable by the masses,” he said, which means there are going…

View original post 723 more words

Data Science for Social Good and Humanitarian Action


My (new) colleagues at the University of Chicago recently launched a new and exciting program called “Data Science for Social Good”. The program, which launches this summer, will bring together dozens top-notch data scientists, computer scientists an social scientists to address major social challenges. Advisors for this initiative include Eric Schmidt (Google), Raed Ghani (Obama Administration) and my very likable colleague Jake Porway (DataKind). Think of “Data Science for Social Good” as a “Code for America” but broader in scope and application. I’m excited to announce that QCRI is looking to collaborate with this important new program given the strong overlap with our Social Innovation Vision, Strategy and Projects.

My team and I at QCRI are hoping to mentor and engage fellows throughout the summer on key humanitarian & development projects we are working on in partnership with the United Nations, Red Cross, World Bank and…

View original post 166 more words

How energy harvesting tech could power wearables and the internet of things


It’s all very well talking about the evolution of wearable computing and the internet of things, but something has to power these thin and/or tiny devices. For that reason, it’s a good thing that so many ideas are popping up in the field of energy harvesting and storage.

Some of these ideas were on display this week at the Printed Electronics Europe 2013 event in Berlin, which took in a variety of sub-events including the Energy Harvesting & Storage Europe show. The concepts ranged from the practical to the experimental, so let’s start with the practical.

Here’s Perpetuum‘s Vibration Energy Harvester (VEH), being carried around (appropriately) on a model train.

Perpetuum train sensor

The VEH is a wireless sensor that gets attached to rotating components, such as wheel bearings, on trains. Cleverly, the device both measures and is powered by mechanical vibration. It also measures temperature, and it wirelessly transmits the results…

View original post 634 more words

On big data, the Boston Marathon and civil liberties


For all the concerns over mobile phone logs, video footage and other data collection that could potentially be used to survail American citizens, it’s times like this that I think we see their real value.

According to a Los Angeles Times article about Monday’s bomb attack at the Boston Marathon, the FBI has collected 10 terabytes that it’s sifting through in order to seek out clues about what exactly happened and who did it. Maybe I’m just a techno-optimist, but I find this very reassuring.

According the Times, “The data include call logs collected by cellphone towers along the marathon route and surveillance footage collected by city cameras, local businesses, gas stations, media outlets and spectators who volunteered to provide their videos and snap shots.”

Lots of data means lots of potential value

It’s reassuring because I’ve spoken with so many smart people over the years who can do amazing…

View original post 631 more words

Blab predicts what people will tweet, blog and report on


It’s one thing to monitor social statements on Twitter and other social networks as they happen. It’s another thing to predict what will happen over the next three days.

Blab, a Seattle-based company, has emerged with a tool that lets companies do just that, with visualizations of where conversations will pop up from more than 50,000 sources, including Facebook, (s fb) Tumblr, Twitter, YouTube (s goog), blogs and news outlets. It does this by paying close attention to where a conversation is now and then predicting based on what other conversations it could look like. For example, if people started talking about a previous Amazon (s amzn) Web Services outage on Twitter and then the conversation moved to blogs and then to mainstream media outlets, that same pattern could happen in the case of another AWS outage. That’s why measuring the trajectory of each conversation and storing it for…

View original post 188 more words

iRhythm raises $16M for wearable cardiac monitoring patch


A wearable patch that can monitor a patient’s heart activity for two weeks straight has won a $16 million investment. iRhythm, a startup spun out of Stanford’s biodesign program, plans to announce on Wednesday that it has raised a Series D round that brings its total amount raised to $68 million. The round was led by Norwest Venture Partners (NVP), with participation from existing funders New Leaf Ventures, Synergy Life Science Partners and Kaiser Permanente Ventures.

“What the company has done is do a good job integrating consumer electronics and ergonomics,” said Casper de Clercq, a partner at NVP and a new member of iRhythm’s board. He said the company was attractive because it’s addressing an unmet need with innovative technology and its Zio device (not to be confused with the now-defunct sleep monitor Zeo) is already reimbursed by health plans as a diagnostic test.

If a doctor suspects…

View original post 197 more words

The Big Data Convergence

My missives

As we scan the concepts, technologies, products and the practices in the big data space, lot of things get muddier.

Neither the progression nor the boundaries are clear. We are still in the descriptive stage in terms of the application of the analytics technologies.

I had a good conversation with Bob Friday yesterday – his question was “What prevents us from answering 80% of the questions via automatic inferences ?” And that is the “Adaptive” stage we need to be …

I think a diagram is much better than me writing 100,000 words. So here it is :


In many ways, a lot of the underlying technologies are converging.

For example, A(rtificial) I(ntelligence) = NLP + N(atural) L(anguage) U(nderstanding) + ML + K(nowledge) R(epresentation) + Reasoning
Are Amazing Intelligent Machines in the works ?

View original post

apply vs for

The stupidest thing...

It’s widely understood that, in R programming, one should avoid for loops and always try to use apply-type functions.

But this isn’t entirely true. It may have been true for Splus, back in the day: As I recall, that had to do with the entire environment from each iteration being retained in memory.

Here’s a simple example:

There’s a great commentary on this point by Uwe Ligges and John Fox in the May, 2008, issue of R News (see the “R help desk”, starting on page 46, and note that R News is now the R Journal).

Also see the related discussion at stackoverflow.

They say that apply can be more readable. It can certainly be more compact, but I usually find a for loop to be more readable, perhaps because I’m a C programmer first and an R programmer second.

A key point, from Ligges and…

View original post 18 more words