Parallel framework for statistical analysis package “R” 12.30.2008
Yes, I know that R doesn’t have quotes, but I thought that the non-R users out there might think it was a typo. Good news if you use R and yearn for easier access to parallel goodness: SPRINT
A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.
We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor.
Like what you're reading? Come back every day for HPC news, or subscribe to email or RSS updates.



Most Popular News
Most Commented
Comments