Parallel framework for statistical analysis package "R"

Print Friendly, PDF & Email

Yes, I know that R doesn’t have quotes, but I thought that the non-R users out there might think it was a typo. Good news if you use R and yearn for easier access to parallel goodness: SPRINT

A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.

We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor.

Trackbacks

  1. […] sysadmin that supports multiple R users, a post earlier this month on InsideHPC drew my attention – Parallel framework for statistical analysis package “R”.  The creators of the Simple Parallel R INTerface have “designed and built a prototype […]