Test Drive of Parallel Computing with R

Yet Another Blog in Statistical Computing

Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function.

In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to calculate the aggregation by groups. For the comparison purpose, parLapply() function in snow package and mclapply() in multicore package are also used to generate the identical aggregated data.

Below is the benchmark output. As shown, the parallel solution, e.g. SNOW or MULTICORE, is 3 times more efficient than the baseline solution, e.g. LAPPLY, in terms of user time.

In order to illustrate the CPU usage, multiple screenshots have also been taken to show the difference between parallelism and single-thread.

In the first screenshot, it is shown that only 1 out of 8 CPUs is used…

View original post 56 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: