OrangeFS, the user-friendly, open source parallel file system for high performance computing, has a lot of endearing qualities. Heading up the list is the fact that it just works – download it to your existing commodity hardware and realize immediate and substantial boosts in the performance of your HPC and storage clusters.
This is not just another file system. Like your everyday file systems found on PCs, OrangeFS is a tool for managing the storage, retrieval and updating or changing the content of computer files. But it is designed to run on large HPC clusters composed of many storage nodes networked together and containing massive amounts of information.
OrangeFS is the next generation of PVFS (Parallel Virtual File System), developed at Clemson University, S.C. and Argonne National Lab. PVFS was used primarily for discreet, large scale scientific workloads. OrangeFS is much more of a workhorse, handling a variety of compute intensive applications such as high performance computing (HPC), genomics, bioinformatics and other Big Data applications.
The file system is designed to run on large parallel cluster computer systems that typically consist of dozens to thousands of compute nodes connected with a high performance communication network fabric such as InfiniBand or high speed Ethernet.
OrangeFS supports the parallel reading and writing of a file’s objects, allowing large computational problems to be divided into smaller pieces, which can be run on many different nodes at the same time. It reduces the time-to-solution for computationally-intense work in science, engineering, and business. Also, OrangeFS’s support for global namespace allows users to access a single file instance synchronously for multiple purposes.
OrangeFS at Work
Here are just a few examples of OrangeFS applications:
- Replacing HDFS with OrangeFS – OrangeFS can replace and improve upon HDFS as the file system for Hadoop MapReduce. This is a one-for-one swap, which requires no code modification. OrangeFS increases MapReduce performance in remote mode by about 25% and also provides good results when clients significantly overcommit the storage servers. It provides a solid solution for converged HPC and Big Data Storage.
- HPC Parallel jobs – OrangeFS interfaces with MPI, allowing HPC parallel jobs to use the OrangeFS system seamlessly.
- Big Data Aggregate Storage – OrangeFS builds on the success of PVFS’s ability to handle large data sets. It is ideal to meet the growing demands of commercial Big Data, where the ability to capture, manage and process huge data sets in reasonable time has made major contributions to information management.
- Guest virtualization – Guest VMs, in the cloud or on premise, can use OrangeFS’s flexibility and high performance to achieve I/O similar to a dedicated system.
OrangeFS interfaces with a diverse set of clients for easy access to data. Its flexible interface allows use from several operating systems, including Linux, Windows and Mac. Compatible client interfaces include Direct interface, WebDAV, S3, Hadoop, ROMIO and REST.
This highly scalable storage solution supports high performance SSD for metadata, and improves smaller file performance. Also supported is distributed metadata for directory entries and client-side caching via Direct Interface.
Performance and Capacity
Orange FS just works – installation is hassle free and requires minimal system administrator involvement.
This is why the Human Language Technology Center of Excellence (HLTCOE) at Johns Hopkins University chose OrangeFS to help handle its next generation storage requirements.
A report issued by HLTCOE states: “During our evaluation we found OrangeFS extremely easy to setup, configure and administer. Additional performance is gained by storing the file system metadata on SSDs. In our limited testing, we found that the performance scaled nearly linearly with each additional server, which means that when we add a server we obtain both additional disk space and I/O performance.”
Because it is an object-based file system, OrangeFS is able to achieve the high levels of performance demanded by HPC and Big Data applications. Each file and directory consists of two or more objects – one containing metadata and the others the file data. This division and distribution of data to the servers is imperceptible to the user while, behind the scenes, OrangeFS provides significantly improved scalability in performance and capacity.
The new OrangeFS version 2.9 now supports two new security modes in addition to its standard security. The solution’s key-based security uses cryptographic key pairs for servers and clients. The second mode, certificate-based security, uses certificates uniquely associated with each OrangeFS user and seamlessly integrates with an LDAP directory for identity mapping. Another new feature allows directory entries to be distributed among multiple metadata servers according to attributes assigned to the directories.
OrangeFS is in the cloud as well. The solution is available in the AWS Marketplace, providing a fast efficient way to launch a Cloud Storage Cluster via Cloud Formation Templates.
Omnibond Support Services
Despite its ease of use and administration, OrangeFS is not cast in stone. The solution is constantly changing as HPC and Big Data capabilities rapidly increase in both complexity and the amount of data being dealt with. Omnibond provides the support and development required to ensure that OrangeFS keep pace with today and tomorrows computational and storage requirements.
This includes a team of professional software engineers dedicated to the solution’s development, such as turning customer feedback into new features. Omnibond provides all the requisite customer support including issue escalation and quick turnaround for patches when needed. Other services include design, performance tuning and targeted development.
See for Yourself
Find out what OrangeFS can do for you. Just go to the OrangeFS website download page to download a tarball or obtain the latest changes to the system from the OrangeFS repository. Instructions on how to install the system are found in the documentation section of the web site.
You’ll discover why OrangeFS has the reputation of being one of the best performing file systems available today for HPC and Big Data applications. And you will appreciate how easy it is to build, install and run. OrangeFS just works.