Entries filed under “Enterprise HPC”

Applications of high performance computing in the business or in support of business operations.

Whamcloud aims to make sure Lustre has a future in HPC

Brent Gorda

Brent Gorda

insideHPC had a chance this week to sit down with the executives of the newly minted Whamcloud, Brent Gorda [CEO] and Eric Barton [CTO]. Many of you probably know Brent from his work within the US Department of Energy supercomputing circles. He’s also very active in organizing various technical and community events for the IEEE/ACM Supercomputing conference series. Eric Barton brings 25 years of development experience in supercomputing to the Whamcloud team. He has been working on Lustre since he was brought in to stabilize its network stack when the project first received DOE funding. Most recently he was a Principle Engineer at Sun/Oracle where he served as Chief Architect of the Lustre group.

As you may know, Whamcloud’s business model is centered on the Lustre parallel file system. But what exactly does this mean? Lustre is an open source project, managed and held by the Oracle Corporation via their acquisition of Sun Microsystems. Given that Oracle’s core business isn’t dependent upon Lustre, many folks with large-scale Lustre deployments have been worried about the progression of the code base. We wanted to dig a little deeper and find out exactly what Whamcloud is up to with respect to our little friend Lustre.

During the interview, Brent Gorda summed up their intentions best: “Reduce the complexity and increase the community.”  Whamcloud intends to pour their own efforts into developing, hardening and improving what has become a real asset to the high performance computing community.  They plan on doing so via code contributions to the root Lustre source tree.  Unlike many other open source efforts that have become commercial products, they will not fork the source tree for their own endeavors.  This is extremely important in building and maintaining their idea of community: Lustre is everyone’s Lustre.

Eric Barton

Eric Barton

So how does affect their view of development?  I asked Eric Barton what their three top goals were with respect to development.  First, he said that Whamcloud is committed to working to improve the quality and stability of the code.  Without a stable code base to work from, scalability is simply a pipe dream.  This also implies de-prioritizing several of the features requested for the initial Lustre 2.0 release. 

The second major development goal is to begin preparing for the exascale deployments.  This one really threw me for a loop.  However, Eric is very grounded is his thought when he explains why.  Given that they want to always maintain the quality and stability of the file system, they need to begin to think intelligently about how to address systems with hundreds of thousands of nodes in the future.  They want to ensure that these features make it into the code base gracefully, as opposed to dropping the features in the community’s cage all at once.  Finally, he wants to make sure that the proper health and monitoring features gracefully make it into the source.  Exascale means nothing if the platform can’t be kept stable long enough to run an application.  A healthy system is a happy system.

So where is Oracle in all of this?  Brent and Eric were very adamant that they do not intend to directly compete with Oracle.  Oracle, via their inherited Sun support contracts, receives revenue based on the service and support of the Lustre file system.  They both indicated that Whamcloud will carefully manage its relationship and impact on Oracle. Whamcloud’s focus is Lustre on Linux for HPC — particularly the high end — whereas Oracle is more focused on commercial deployments. Whamcloud would rather be good stewards of the community and garner revenue through non-recurring engineering.

All in all, Whamcloud seems to be off to a raging start.  They’re growing on a daily basis [up to 10 employees at the time of the interview] and they’ve already had significant interest from partners and potential customers.  What was recently a damsel in distress with Lustre, now has its knight in shining armor with Whamcloud.

Also posted in Business of HPC, Featured Stories, HPC, HPC Hardware, HPC Software, Storage | 1 Comment

HPC gives scientific computing on your cell phone a boost

Reader Jay Blair sent me a pointer to this story from TACC about an Android app that runs a reduced model locally on the cell phone based on results computed over a long series of runs on Ranger.

TACC LogoThe team performed a series of expensive high-fidelity simulations on the Ranger supercomputer to generate a small “reduced model” which was transferred to a Google Android smart phone. They were then able to solve problems on the phone and visualize the results on the fly.

The project proved the potential for reduced order methods to perform real-time and reliable simulations for complicated problems on handheld devices.

This approach is already used operationally in a variety of civilian and defense scenarios to allow professionals ranging from bridge fatique assessment teams to rapid crisis response forces to tradeoff some accuracy for an answer right now. Typically these reduced models have run on laptops or larger portable computers, but today’s mobile devices are becoming quite powerful in their own right.

This is not the first time that model reduction algorithms have been used to ameliorate the complexities of large-scale physical simulations.  The advantage of the system designed by Knezevic and his colleagues is its rigorous error bounds, which tell a user the range of possible solutions, and provide a metric of whether an answer is accurate or not. The error bounds are based on mathematical theory developed in Prof. Patera’s research group at MIT over a number of years.

“We have a bound on how much accuracy we’re losing with our reduced model, so we can say with rigor that we’re doing supercomputing on a phone,” Knezevic said.

The quantitative understanding of the error bounds is very important, and its a nice addition in this work.

Also posted in HPTC | 2 Comments

X-ISS Beefs Up HPC Administration Products and Services

X-ISS made two big announcements today regarding some new HPC-centric products and services.  Those of you with reasonably sized Dell clusters might know X-ISS has the integration and services company that, on occasion, assists Dell in performing HPC deployments.  The first press release officially announces what they call the DecisionHPC monitoring suite.  The web-based monitoring and analytics package helps collect and report data on heterogeneous computing systems.  The goal of DecisionHPC is provide information to systems administrations such that they can make more reliable decisions.

“With more than 500 HPC cluster system installations and support services since 1993, X-ISS has observed that most users of HPC systems struggle to maximize system productivity, aligning computing resource usage with organizational goals, and plan for future needs; these problems stem from what until now, has been a lack of detailed and insightful system statistics and analytics, and is the reason we have developed DecisionHPC. The ability to monitor, report on and provide immediate, historical and forecasted system data and analytics, even on heterogeneous and geographically separate systems, provides HPC users new capabilities that are critical to optimizing ROI on HPC investments.” [Deepak Khosla]

The second announcement involves managed HPC services.  Not only can X-ISS install your machine now, they can also manage it remotely.  ManagedHPC from X-ISS is the outsourced HPC system management service that allows customers without systems administration expertise in house to purchase HPC resources.

“As a part of this significant investment into the Engineering Department at the University of Wisconsin, we were able to procure a 142-node cluster computer from Dell and funding for a cluster system administrator. After an extensive search for this position, we only received a small handful of candidates, of which only a few were qualified. We didn’t have the timeframe to hire and train the right person for the job, which is where X-ISS and its ManagedHPC program came into play. X-ISS has been able to handle getting the system up and running,
software installations, and proactively handle technical issues so we could focus on working with the Engineering staff on specific code needs and department usage of this shared resource. It has allowed us to focus our attention where it was most needed.” [David Crass, University of Wisconsin Director of Research Computing]

ManagedHPC services include:

  • Turnkey outsource system management service
  • On-site installation & setup
  • Secure remote system monitoring, management & support
  • Seasoned X-ISS Team w/ more than 500 HPC installations
  • Proactive management & reporting process
  • Uses DecisionHPC to help guide growth

For more info on DecisionHPC, check out its website here.

Also posted in Business of HPC, System Management | Leave a comment

Verari Changes the Sign Out Front

Verari has announced that they have changed the sign on the front of the building.  Why, you ask?  They’re focusing their business model specifically on providing hardware for cloud-like environments.  You mean big datacenters?  Yeah, those too.

Verari Tech logoBeing able to base our cloud storage and compute products on Verari’s world class BladeRack® 2 Series technology and FOREST containerized data center infrastructure puts us at the front of the pack to serve the demanding cloud customer,” said Marc Brown, President and COO, Cirrascale. “These products, based on Verari’s patented Vertical Cooling Technology, generated over $500 Million in installed systems in the high performance computing and enterprise markets; these customer segments are the foundation of the burgeoning cloud market of today. This technology is a winning formula for the cloud customer.”

Cirrascale was actually organized under the “Verari Technologies” name while acquiring the intellectual property and other assets of Verari Systems back in January 2010.

Cirrascale logoTechnology innovation is only half the story at Cirrascale; we must also innovate with our business model,” said Dave Driggers, Chairman and CEO, Cirrascale. “Cloud and Web 2.0 businesses are placing new demands on their suppliers. Unlike the enterprise data center customer served by traditional computer companies with established product lines and large IT consulting businesses, the agile, self-sufficient cloud and web 2.0 customers want to collaborate to define their platforms and create a purpose-built data center infrastructure that addresses their unique requirements.”

Quoting their release: “Cirrascale will focus on customers buying at the data center and rack infrastructure level, across a range of storage and computing models including low-power micro-servers, high density storage, scale-out multi-core, HPC cluster and GP/GPU computing. Customers are served by the same physical rack infrastructure that accommodates the customer-defined power, density and cooling requirements.”  This sounds surprisingly like the previous Verari business model. It also sounds very much like the business model of Rackable, now SGI and portions of the Dell business.  Ultimately, this is a very tough market niche.

For more info, read their full press release here.

Also posted in Datacenter operations | 1 Comment

Industry experts form new Lustre startup

Following the official acquisition of Sun Microsystems by Oracle Corporation, there have been quite a few HPC industry pundits debating the eventual fate of the famed parallel file system Lustre.  Lustre made its name by anchoring super-scale computational centers such as Oak Ridge National Lab.  Considering Oracle’s core business model does not rely on technologies such as Lustre, the many folks who depend on Lustre for their high performance parallel file system have question marks beside support and continued development. Well, the skies have cleared: lets give a round of applause to Whamcloud.

What’s Whamcloud? Whamcloud is a new venture-backed startup that emerged from stealth mode this morning dedicated to filling the gap for future Lustre development and support.  Their business model is clear, concise and quite refreshing from a startup company in HPC.  As a company, they have three goals:

  1. Whamcloud will combine the world’s leading HPC and storage talent to evolve the state of parallel storage with a strategic focus on the most scalable applications, specifically high performance and cloud computing
  2. Whamcloud will contribute and evolve open source file storage technologies, including the Lustre file system, upon an open-source Linux foundation using Linux storage technology
  3. Whamcloud will focus on enabling open source Lustre storage technology in the industry by opening up file system support to the whole industry, with a hardware-agnostic storage certification and support program

So why the enthusiasm? Whamcloud has assembled a serious team of industry experts.  Not the kind with the typical “CEO of Foo” resumes.  These experts are real HPC gurus.  So who’s lurking the halls of Whamcloud?  Brent Gorda will hold the title of CEO.  Those of you familiar with the Department of Energy know that Brent has been around big HPC for quite some time.  He’s also a former contributor to the Supercomputing Cluster Challenge.  Eric Barton, CTO, was most recently a Principal Engineer at Sun/Oracle and Chief Architect with the Lustre group.  Robert Read, Whamcloud’s Principal Engineer, was also formerly at Sun/Oracle leading the charge for Lustre 2.0 development.

What’s not to like? You have two of the leading visionaries behind recent development efforts in Lustre and one of the thought leaders in Lustre implementation and operations.

There is tremendous demand for leadership from a professional engineering organization that is focused on evolving Lustre for the next 10 years of HPC and cloud storage,” said Brent Gorda, Whamcloud CEO. “History has proven that hardware-oriented purchases of open-platform file storage technologies are disruptive to the growth of scale-out storage technology. First and foremost, Whamcloud will ensure broad and continued international adoption of these technologies through a hardware-agnostic customer approach, across a broad array of data-hungry markets.”

Folks, this is one to keep and eye on.  Lustre is and will continue to be a vital piece of the HPC puzzle.  As larger systems and scalable applications begin to become the norm in HPC, the pressures of I/O and storage will continue to increase.  Whamcloud is well positioned to take Lustre to the next stage of scalability and performance.

Also posted in Business of HPC, Featured Stories, HPC, HPC Software | 1 Comment

OSC partners with Moldex3D to bring industrial simulation software to Blue Collar Computing

In mid-July OSC let me know about a new development with their Blue Collar Computing program, which is seeing something of a resurgence these days (then I went on vacation and forgot to publish it).

OSC logoOSC has partnered with Moldex3D to demonstrate the performance of its pioneering 3-D simulations for efficient verifications of part/mold designs for educational use. As part of this partnership, Moldex3D is donating 30 eDesign licenses over a three-year period with a cost value of $1,050,000 in support of OSC’s Ralph Regula School of Computational Science education program.

…As part of its Blue Collar Computing™ offerings, OSC will provide manufacturers with the training and computational resources needed to use advanced modeling and simulation to test processes and product design. Industries participating in the OSC’s Blue Collar Computing program gain access to its advanced modeling and simulation resources and services in order to reduce the time and expense involved in determining proof of concept and designing new products, as well as to improve production efficiency. The program also uses custom-designed web portals to give businesses secure, easy access to processing power, and mass storage systems without the need for in-house infrastructure or computational science expertise.

More at the link above. This announcement is part of OSC’s partnership with PolymerOhio, a statewide effort to increase use of modeling and simulation in the polymer industry in Ohio. This kind of effort is key to catalyzing the adoption of HPC to solve “everyday” problems in industry and manufacturing in order to build out the “missing middle” in the HPC marketplace.

Also posted in Collaborations, HPC Education and Training | Leave a comment

NVIDIA-based cloud service offers GPUs for rent

PEER1 Hosting announced from SIGGRAPH yesterday in Los Angeles that their GPU-powered public rendering cloud is up and going. From the press release

nVidia logoThe system is running the RealityServer 3D web application service platform, developed by mental images, a wholly owned subsidiary of NVIDIA. The RealityServer platform is a powerful combination of NVIDIA Tesla GPUs and 3D web services software that delivers interactive, photorealistic applications over the web using the iray renderer, enabling animators, product designers, architects and consumers to easily visualize 3D scenes with remarkable realism.

With the use of massively parallel NVIDIA Tesla GPUs PEER 1 Hosting can now offer customers flexible and reliable access to a system capable of delivering high computational performance across demanding applications such as graphics rendering, complex quantitative processing, video compression and large-model 3D web services for access by mobile clients.

We’ve talked about these technologies before. According to NVIDIA, “more than 128″ Tesla S1070 and Tesla M2050 (Fermi architecture-based) cards are in the system (the exact number has not been disclosed) GPUs as well as RealityServer are now available for purchase worldwide and will be hosted as a managed hosting offering at PEER 1 data centers in Toronto, Canada and London, UK.

According to PEER1, pricing is structured on a per GPU per month basis for starters. A server with 4x s1070 GPU and 2x Intel processors starts at around $2000 a month fully managed.  Discounts are available for long term contracts and premiums are applied if you need less than one month. That isn’t cheap, so only users with a business model (or research funding) need apply.

Why mix the Tesla 10- and 20-series? Some of PEER1′s customers may want the 4GB memory of S1070 GPUs vs the 3GB of M2050. Also, Peer1 bought their S1070s before the M2050s became available.

Also posted in Cloud HPC, GPUs, HPC Hardware | 1 Comment

State of the Union: Modeling and Simulation

NCSA LogoNCSA has posted another of their recorded presentations by interesting visitors. This time Cynthia McIntyre, senior vice president of the Council on Competitiveness, describes how high-performance computing can transform industry, and what the Council is doing to expand the use of modeling and simulation in the private sector.

Video here.

Also posted in National and Legislative Action | Leave a comment

Digipede Network 2.4 release

Digipede is one of those companies quietly working to make HPC easier for the vast majority of the world that doesn’t currently, and doesn’t want to, use anything other than Windows. They launched at DEMO in 2005 but we haven’t written about them in a while, so here’s a refresher if they are new to you (by the way, if you’d like us to run a 411 on your company, send me an email ).

Their latest news is about the launch of version 2.4 of their flagship product, Digipede Network. In a nutshell, by automatically deploying .NET assemblies (and related files), then distributing and executing .NET objects natively, the Digipede Network adds support for high-performance .NET applications to Windows HPC Server.

Digipede Networks logo“This is a very customer-driven release,” said John Powers, President of Digipede. “We’ve spent a lot of time listening to our most demanding customers, the folks who really push the envelope on grid computing projects. We’ve been pouring over support cases, replicating customer configurations, and really focusing on features and performance improvements that help out with the most extreme cases. As a result, Version 2.4 now handles many of the most difficult grid scenarios more smoothly. For example, customers can handle a huge number of very short tasks more smoothly, and can get greater throughput from I/O-intensive distributed applications. This greatly expands the class of applications that are good candidates for grid computing.”

Digipede’s most recent release also includes the capability to host .NET 4 applications, and has earned Windows 7 certification. “It’s important for us to keep current on Microsoft’s technologies,” continued Powers. “It’s surprising how little work is being done by other vendors to take advantage of the platform Microsoft provides to develop true high-performance distributed applications. We continue to win converts from former UNIX and Linux cluster users when they see how much easier the development experience is with the Digipede Network on Windows.”


Also posted in HPC Software, Tools | Leave a comment

Amazon adds support for traditional HPC workloads with Cluster Compute instance

Today Amazon CTO Werner Vogels announced on his blog that Amazon EC2 has added what it is calling Cluster Compute instances specifically to support the kinds of closely coupled workloads that traditional HPC users often run. This is an important step in growing the relevance of EC2 resources to high performance computing given the (unsurprising) benchmark results that have indicated that Amazon’s traditional highly virtualized servers underperform on these types of applications (lots of writing on this, but see here and here for examples). Vogels acknowledges this in his post

As much as Amazon EC2 and Elastic Map Reduce have been successful in freeing some HPC customers with highly parallelized workloads from the typical challenges of HPC infrastructure in capital investment and the associated heavy operation lifting, there were several classes of HPC workloads for which the existing instance types of Amazon EC2 have not been the right solution. In particular this has been true for applications based on algorithms – often MPI-based – that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. There has been no easy way for developers to do this in Amazon EC2… until today.

The new offering gives users the ability to get at higher performance networks and to specify exactly the hardware they need to run on (though as far as I can tell your networking options don’t include IB)

Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Cluster Compute Instances can be grouped as cluster using a “cluster placement group” to indicate that these are instances that require low-latency, high bandwidth communication. When instances are placed in a cluster they have access to low latency, non-blocking 10 Gbps networking when communicating the other instances in the cluster.

Next, Cluster Compute Instances are specified down to the processor type so developers can squeeze optimal performance out of them using compiler architecture-specific optimizations. At launch Cluster Computer Instances for Amazon EC2 will have 2 Intel Xeon X5570 (also known as quad core i7 or Nehalem) processors.

Amazon has also issued an official press release about the new offering. NERSC has been among those exploring the use of EC2 resources for scientific computing as we reported earlier this summer, and they’ve seen positive results

“Many of our scientific research areas require high-throughput, low-latency, interconnected systems where applications can quickly communicate with each other, so we were happy to collaborate with Amazon Web Services to test drive our HPC applications on Cluster Compute Instances for Amazon EC2,” said Keith Jackson, a computer scientist at the Lawrence Berkeley National Lab. “In our series of comprehensive benchmark tests, we found our HPC applications ran 8.5 times faster on Cluster Compute Instances for Amazon EC2 than the previous EC2 instance types.”

Since NERSC was reporting slowdowns of “over a factor of 10″ (quote from Kathy Yelick in that NERSC story linked above), this puts Amazon notionally within striking distance of what you could do with your own cluster. When you factor in things like not having to have your own admins, floor space, and power and cooling, you get to an equation that starts to look like its worth seriously investigating.

There is only a single offering in the Cluster Compute product line right now; here are the specs according to Amazon’s product page

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Oddly, there is a default usage limit of 8 instances (64 cores), but the web page says if you need more you can send them an email.

The press release includes a Linpack performance measurement

“For perspective, in one of our pre-production tests, an 880 server sub-cluster achieved 41.82 TFlops on a LINPACK test run – we’re very excited that Amazon EC2 customers now have access to this type of HPC performance with the low per-hour pricing, elasticity, and functionality they have come to expect from Amazon EC2.” (Peter De Santis, General Manager of Amazon EC2)

Assuming 2.93GHz processors, thats an Rmax of 41.82 TFLOPS on an Rpeak of 82.51 TFLOPS, or about 51% efficiency. For comparison, system number 162 on the Top500 is a 6400 core GigE connected Xeon 5570 (2.93 GHz) system that achieves 39.77 TFLOPS (Rpeak 75.01 TFLOPS) at an efficiency of 53%.

Also posted in Cloud HPC | 7 Comments

OSC’s Blue Collar Computing effort goes international

This week the Ohio Supercomputer Center announced that the French consulting computing Sciences Computers Consultants is partnering with the Blue Collar Computing project to bring new applications and expertise to the polymer industry

OSC logoAs part of its Blue Collar Computing offerings, OSC will provide SCC with computational infrastructure and services to test and scale advanced modeling and simulation software for polymer extrusion and mixing on its supercomputers with the intent of developing web portals for polymer industry process modeling.  SCC numerical simulations applications are used by companies in high technology fields within the polymer, energy, automotive and food industries.

SCC has procured from OSC a startup package that consists of 2,500 production-level compute cycles and advanced technical support.  As part of the biannual agreement, SCC will receive up to 150K CPU hours and 250GB of storage per year, as well as 20 user accounts for each project, outside network connectivity and technical support.  SCC intends to install its flagship software product, XimeX, on OSC’s systems for scalability testing and small pilot projects.

The two are now on the lookout, along with trade group PolymerOhio,  for companies in the polymer industry to work with them on a pilot project demonstrating the potential of the partnership.

“Partnering with OSC allows us to develop a significant toehold in the U.S. to answer industrial needs for process analysis and validation, material behavior analysis, and other engineering studies,” said Philippe David, general manager of SCC.

Part of the problem with radically growing the user base for HPC (a move that would transform modern society, inundating it with everything from new drugs to cheap ways to get clean water to sub-Saharan African villages) is that most people’s world view doesn’t include any connection to high performance computing apart from some vague ideas from Jurassic Park. A sensible way to evangelize the solution is to partner with software providers, the people who already know what customers could benefit from HPC, and have them reach out to their customers. Building this partnership enables SCC to not only suggest that HPC might help, but to be able to offer its customers a low-barrier path to computing in a short timeframe.

Also posted in Applied HPC, Business of HPC, Collaborations | Leave a comment

Oracle Announces New HPC Cluster Servers

…but HPC is not what you think it is.  Oracle today announced a series of new additions to its Sun x86 blade servers with integrated network fabrics.  True to Larry Ellison’s words, the new machines are focused on the high-end server market.  However, given the technical aspects highlighted in the release, it seems the target market is really high performance enterprise computing.

Oracle logoWe claim we can manage a full blade ecosystem without requiring any network skills, because network virtualization is done in the silicon and through Oracle middleware technology,” Dimitris Dovas, director of product management for Sun hardware at Oracle, said on a videoconference announcing the new hardware.

According to the release info, the new Sun Fire x86 Clustered Systems are designed for customers running a mix of Oracle and non-Oracle enterprise workloads.  “They are optimized for Oracle Solaris, Oracle Enterprise Linux and Oracle VM, which supports Red Hat and Suse Linux along with the KVM hypervisor.”

From the hardware perspective, the new blades contain either Intel 5600 or 7500 series processors.  The latter of which is targeted at the market currently occupied by the Sun UltraSparc-based platforms.  Other improvements include lights-out firmware and BIOS management that dials home to Oracle for updates and integrated 10GbE fabrics.

For more info on the new enterprise server digs from Oracle, check out the info here.

Also posted in Business of HPC, Datacenter operations | 1 Comment

Software gap keeps manufacturers out of HPC

NCSA Logo

NCSA has just posted video of talk given in May at their 2010 Private Sector Program Annual Meeting by Paul Fussell, the senior manager of the Applied Mathematical Modeling group at Boeing. In the video Fussell argues that it’s software that’s slowing the adoption of HPC in the manufacturing base.


Also posted in Events | Leave a comment

QLogic Aims for the Fences with Infiniband Fabric Suite

qlogicBefore you mark this as “just another Infiniband press release,” you might want to reconsider.  I had the pleasure of speaking with Phil Murphy this week, VP of QLogic’s Network Solutions Group.  The Network Solutions Group heads up the goodness that is QLogic’s TrueScale Infiniband product suite.  Those who have been around the Infiniband block before remember that this group was formerly their own company called PathScale.  QLogic acquired the startup and pumped them full of funding and corporate clout with the fabs.  After several years of work, what they have is a high bandwidth, low latency interconnect that looks like Infiniband, smells like Infiniband but runs like a scalded cat.

Our conversation got off to a quick start with a bit of Infiniband history.  Infiniband was originally designed as a data center consolidation product.  Ethernet, fibre channel and even PCI carried over the same phy was the ambitious dream of the early adopters.  As such, the early protocol stacks reflected the idea of encapsulating multiple frame or packetized network layers over a single interconnect.  Exactly the sort of design that most HPC network gurus cringe at.

Fast forward to 2010.  QLogic has decided to change the face of their Infiniband network stack.  Rather than barreling down the path of “queue-pair” style Infiniband communication [Verbs for those in the know], they have implemented a new connection-less and state-less communication primitive.  The new software layer allows applications to send millions [literally] of concurrent messages without paying a terrible amount of setup penalty.  Who many millions?  According to Phil, traditional Infiniband products will peak at around 7 million messages per second.  QLogic’s new stack will hit 30 million messages per second.

QLogic accomplishes all this by going down into the guts of Infiniband routing and QoS metrics in order to tune the fabric for a myriad of different message classes.  Hammering a disk sub system will large blocks?  They can do that.  Hitting a neighboring node will billions of small messages?  They do that too.  With IFS 6.0 they’ve wrapped up the following additional features:

  • Virtual Fabrics combined with application-specific CoS, which automatically dedicates classes of service within the fabric to ensure the desired level of bandwidth and appropriate priority is applied to each application. In addition, the virtual fabrics capability helps eliminate manual provisioning of application services across the fabric, significantly reducing management time and costs.
  • Adaptive Routing continually monitors application messaging patterns and selects the optimum path for each traffic flow, eliminating slowdowns caused by pathway bottlenecks.
  • Dispersive Routing, which load-balances traffic among multiple pathways and uses QLogic® Performance Scaled Messaging (PSM) to automatically ensure that packets arrive at their destination for rapid processing. Dispersive Routing leverages the entire fabric to ensure maximum communications performance for all jobs, even in the presence of other messaging-intensive applications.
  • Full leverage of vendor-specific message passing interface (MPI) libraries to maximize MPI application performance. All supported MPIs can take advantage of IFS’s pipelined data transfer mechanism, which was specifically designed for MPI communication semantics, as well as additional enhancements such as Dispersive Routing.
  • Full support for additional HPC network topologies, including torus and mesh as well as fat tree, with enhanced capabilities for failure handling. Alternative topologies like torus and mesh help users reduce networking costs as clusters scale beyond a few hundred nodes, and IFS 6.0 ensures that these users have full access to advanced traffic management features in these complex networking environments

QLogic has gone well out of their way to make Infiniband even more HPC-friendly.  So much so that Dell, IBM, HP and SGI have already signed up to resell/OEM the new gear.  Keep an eye of the continued change via the QLogic Infiniband landscape.  This could prove to change HPC interconnects as we know it.

Correction: SGI remains a Voltaire customer for Infiniband products.

Also posted in Featured Stories, HPC, HPC Hardware, Network | 1 Comment

ANSYS HPC simulation appliance, ‘engineer ready’ supercomputing

We’re building an appliance theme in today’s news. I missed this when it was announced last month, but UK-based Dezineforce has announced that they are selling an HPC appliance designed for the “missing middle” of the HPC market.

The appliance puts ANSYS simulation software installed and ready to run on hardware from Dell, all pre-integrated into a hardware+software package that customers can order ready to plug in

ANSYS develops engineering simulation software used to predict how products will behave in real-world environments. The Dezineforce HPC simulation appliance for ANSYS is delivered truly engineer-ready, requiring no local HPC management or configuration expertise. Pre-integrated ANSYS software enables engineers to work with familiar tools with the added benefits of HPC performance and scalability. Additionally, Dezineforce simulation manager tools optimise simulation scheduling and provide transparent control over multiple parallel simulations, making the most efficient use of available hardware and license resources.

…Dezineforce CEO George Shanks adds: “We developed the Dezineforce HPC simulation appliance for companies who need to improve design team productivity by doing more simulation work faster or developing more efficient designs. Pre-integration of ANSYS in our HPC environment allows engineers to be performing faster solves within an hour of installation. The compute power at their fingertips means complex simulations are executed quickly and efficiently allowing the design team to focus on improving product performance, which can yield enormous business benefits.”

Note that the web site mentions that Microsoft is involved too, and the only way I can think of for that to be true is that the cluster runs Windows HPC Server, but the web site and supplementary materials available on this system aren’t specific on that point (also Dezineforce makes you give up your email address to download the product datasheet: boo!).

The hardware is in an enclosure designed to be used in an office and comes in 16-, 32-, or 64-core versions with with Intel Xeon X5570 processors and either 48, 96, or 192 GB of memory. There is also a rack-mount version that lets you grow up to 128 cores and 384 GB of memory.

Also posted in Collaborations, HPC Software | Leave a comment

Advertisement

Spectra Logig Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap