DataRush RC1

Print Friendly, PDF & Email

Back in March I wrote a feature about Pervasive’s DataRush for HPCwire. The article brought together some things I heard at the Rhode Island conference along with some of the data growth figures I’d been hearing.

Pervasive Software is one of the companies working on the software front of the data intensive computing space, developing software architectures to support intensive analysis of large data stores. Pervasive’s DataRush product is designed primarily for single address space environments of the kind you’ll find in multi-socket, multicore nodes on today’s hardware. The framework is based on a dataflow model, written in Java, and provides high level primitives that mask the complexity and details of the parallel implementation. According to Pervasive CTO Mike Hoskins, DataRush is a “next generation massively parallel data pump.”

At the time the product worked like this:

Data flows and processing steps are described in an XML scripting language that moves data through the system, and transforms it by the application of “operators” such as sort, join, average, and merge. (As of later this year the XML description can be replaced by a Java description of the dataflow.) The framework includes basic operators, and users add new operators to support their specific needs through an SDK. DataRush dynamically assembles the bits of code it needs at runtime and, if desired, users can help the software adapt to varying amounts of available processing power and varying problem sets by binding in operators and operator implementations that are better suited for the situation at hand. This is reminiscent of the poly- or multi-algorithmic work that has been going on in traditional HPTC for some time, and has the potential to offer real advantages.

This is now “later this year,” and Steve Hochschild of Pervasive has emailed to let me know that RC1 of the new DataRush, sans XML, is up on their web site.

While I was at their website I also noticed a challenge they are running: they are guaranteeing users will get a 30x performance boost on their data intensive apps or they’ll get the product free. If you are using DataRush for business or science, add a comment. I’d like to know what readers are doing with it.

I wish Apple would add the same kind of guarantee for their iPhone 3G…