“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”
“The pharmaceutical industry trend toward joint ventures and collaborations has created a need for new platforms in which to work together. We’ll dive into architectural decisions for building collaborative systems. Examples include how such a platform allowed Human Longevity, Inc. to accelerate software deployment to production in a fast-paced research environment, and how Celgene uses AWS for research collaboration with outside universities and foundations.”
“We took the Aries system interconnect from our supercomputers, the industry-standard architecture of our clusters, the scalable graph engine from the Urika-GD appliance, and the pre-integrated, open infrastructure of our Urika-XA system and combined them into one agile analytics platform. The Urika-GX gives our customers the tool they need to overcome their most advanced analytics challenges today, and the platform to bridge to tomorrow.”
Hadoop and Spark clusters have a reputation for being extremely difficult to configure, install, and tune, but help is on the way. The good folks at Cluster Monkey are hosting a crash course entitled Apache Hadoop with Spark in One Day. “After completing the workshop attendees will be able to use and navigate a production Hadoop cluster and develop their own projects by building on the workshop examples.”
In this special guest feature from Scientific Computing World, Andrew Jones from NAG looks ahead at what 2016 has in store for HPC and finds people, not technology, to be the most important issue. “A disconcertingly large proportion of the software used in computational science and engineering today was written for friendlier and less complex technology. An explosion of attention is needed to drag software into a state where it can effectively deliver science using future HPC platforms.”
“What we’re previewing here today is a capability to have an overarching software, resource scheduler and workflow manager that takes all of these disparate sources and unifies them into a single view, making hundreds or thousands of computers look like one, and allowing you to run multiple instances of Spark. We have a very strong Spark multitenancy capability, so you can run multiple instances of Spark simultaneously, and you can run different versions of Spark, so you don’t obligate your organization to upgrade in lockstep.”
Today LBNL announced that a team of scientists from Berkeley Lab’s Computational Research Division has been awarded a grant by Intel to support their goal of enabling data analytics software stacks—notably Spark—to scale out on next-generation high performance computing systems.
Today Intel Corporation and BlueData announced a broad strategic technology and business collaboration, as well as an additional equity investment in BlueData from Intel Capital. BlueData is a Silicon Valley startup that makes it easier for companies to install Big Data infrastructure, such as Apache Hadoop and Spark, in their own data centers or in the cloud.
In this RCE podcast, Brock Palen and Jeff Squyres speak with Matei Zaharia about Apache Spark, a fast engine for large-scale data processing.
“Using the publicly available software packages in the High-Performance Big Data (HiBD) project, we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components.”