Test Drive of Parallel Computing with R
Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function.
In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to calculate the aggregation by groups. For the comparison purpose, parLapply() function in snow package and mclapply() in multicore package are also used to generate the identical aggregated data.
In order to illustrate the CPU usage, multiple screenshots have also been taken to show the difference between parallelism and single-thread.
In the first screenshot, it is shown that only 1 out of 8 CPUs is used…
View original post 56 more words