Interview: How the APAX Profiler Accelerates Memory Storage & Bandwidth

Print Friendly, PDF & Email

Samplify is relatively new company in the HPC market, but their APAX compression technology is already making waves in the market with both software and hardware approaches. To learn more, I caught up with Samplify’s CTO, Al Wegener, author of a new white paper that details how users can apply the APAX Profiler to increase application performance.

insideHPC: What is the APAX white paper about?

Al Wegener: The APAX white paper describes the Memory Wall problem associated with high-performance computing (HPC)), where additional CPU and GPU cores don’t generate faster results because you can’t “feed the beasts” (HPC processors) with operands from memory quickly enough. The paper describes a novel solution (encoding of numerical operands) that results in a measured Memory Wall reduction between 3:1 and 8:1 on HPC application as diverse as multi-physics, climate modeling, and k-means clustering. The APAX encoder works with the APAX Profiler tool to give HPC users new insight into the uncertainty of their input datasets. By encoding operands in software (today) and in memory controller hardware (soon), APAX numerical encoding gives HPC users an adaptive, controllable, and flexible way to reduce DDR, PCIe, Ethernet, Infiniband, and SAS/SATA bottlenecks by 3x to 10x.

insideHPC: What is the APAX Profiler?

Al Wegener: HPC datasets contain both uncertainty and redundancy. While HPC scientists may think their sensor-derived 32-bit or 64-bit data is perfect, typical HPC datasets pick up a lot of noise between the analog sensor and the multi-core CPU or GPU. The APAX Profiler software tool (also available on the Samplify web site) allows HPC users to upload their datasets in order to quantify uncertainties, and to determine the Profiler-recommended APAX encoding operating point that results in “five nines” (0.99999) of correlation between the original dataset and the decoded dataset. For many HPC datasets, “five nines” of decoded quality comes with encoding rates above 3:1, thus reducing the HPC Memory Wall while delivering identical HPC simulation results.

insideHPC: What is the “overcasting” problem and how does APAX help?

Al Wegener: Many HPC simulations, including climate, multi-physics, earthquake, genetic sequencing, and finite element analysis, begin and/or end with real-world sensor measurements. HPC simulations use sensor input to make predictions about the future, but HPC predictions must be compared to the “real world” via subsequent sensor measurements. Sensors generate integer values, but HPC simulations usually use 32-bit and 64-bit floats for computation. “Overcasting” is the tendency in HPC to cast integer values (often with 12 integer bits or less of quality) into floating-point values, without recognizing that the resulting float has been “overcast,” i.e. contains uncertainty that is not reflected in the 32-bit float. The APAX Profiler quantifies the degree of overcasting in HPC datasets by using spectral techniques (FFTs). After recommending an appropriate level of accuracy (uncertainty) for each dataset, the Profiler allows APAX users to fine-tune the accuracy of each dataset while significantly reducing bandwidth and storage requirements.

insideHPC: How is APAX technology a potential enabler for Exascale computing?

Al Wegener: According to the US DARPA Exascale study (2008), Exascale has memory, network, and disk problems, not compute problems. According to DARPA, in order to deliver 1018 flops per second (Exascale), DDR3 memory would have to get 16x faster, while disk drives would have to get 100x faster. By encoding HPC operands (numbers) as they are transferred between multi-core CPU and GPU sockets and DDR, network, and disk drives, APAX reduces the DDR, network, and disk drive bottlenecks of Exascale by user-controllable factors between 3x and 8x.

insideHPC: How does APAX encoding save energy and reduce cloud computing costs?

Al Wegener: Cloud computing depends on cloud-based hardware, but cloud users have to send their data to the cloud and then they have to download the results. By reducing both upload and download costs for users of HPC-on-demand services like Amazon EC2 and Microsoft Azure, APAX saves cloud users both time and money. In addition, experienced cloud users know that CPUs only draw about 40% of server power, while the other 60% is dissipated by DDR memory and disk drives. When APAX reduces DDR and disk bottlenecks, HPC users get their result faster, which reduces cloud-based energy usage. In one memory-bound HPC application, APAX 4:1 encoding resulted in a 3.8x speed-up in “time to results,” and thus a 3.8x reduction in server power consumption.

insideHPC: How is APAX effectively lossless?

Al Wegener: Since sensor samples often comprise the source material for HPC simulations, it’s important to recognize that floating-point numbers are using more bits than required to represent the dynamic range of integer samples. The APAX profiler quantifies the degree to which HPC datasets were overcast and encodes those datasets into “simply the bits that matter.” As APAX beta-testers in HPC climate, multi-physics, and earthquake simulations have verified, their HPC simulation results are identical, but the results come out faster. That’s what Samplify calls “effectively lossless” encoding – the size of HPC input and intermediate datasets are reduced by 3x to 8x, but the results remain the same.

Samplify will demonstrate APAX next week at SC12 booth #4151.