Parallel Multiway Methods for Compression of Massive Data and Other Applications

Print Friendly, PDF & Email

tamaraIn this Invited Talk from SC16, Tamara Kolda from Sandia presents: Parallel Multiway Methods for Compression of Massive Data and Other Applications.

“Scientists are drowning in data. The scientific data produced by high-fidelity simulations and high-precision experiments are far too massive to store. For instance, a modest simulation on a 3D grid with 500 grid points per dimension, tracking 100 variables for 100 time steps yields 5TB of data. Working with this massive data is unwieldy and it may not be retained for future analysis or comparison. Data compression is a necessity, but there are surprisingly few options available for scientific data.

We propose to exploit the 5-way structure (3D spatial grid x time x variable) of the data by applying Tucker tensor decomposition to reveal a latent low-dimensional representation. By taking advantage of multiway structure, we are able to compress combustion science data by a factor of 10-1000 with negligible loss in accuracy. Additionally, we need not reconstruct the entire data set to extract subparts or down-sampled versions. However, compressing such massive data requires a parallel implementation of the Tucker tensor decomposition. We explain the data distribution and algorithm and accompanying analysis. We apply the algorithm to real-world data sets to demonstrate the speed, compression performance, and accuracy of the method. We also consider extensions of this work into functional representations (useful for hierarchical/irregular grids and reduced order models) as well as acceleration via randomized computations. This talk will highlight work by collaborators Woody Austin, Grey Ballard, Alicia Klinvex, Hemanth Kolla, and others.”

Tamara G. Kolda is a Distinguished Member of the Technical Staff at Sandia National Laboratories in Livermore, CA. She holds a Ph.D. in applied mathematics from the University of Maryland at College Park and is a past Householder Postdoctoral Fellow in Scientific Computing at Oak Ridge National Laboratory.

She has received several awards for her work including a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE), an R&D 100 Award, and three best paper prizes. She is a Distinguished Scientist of the Association for Computing Machinery (ACM) and a Fellow of the Society for Industrial and Applied Mathematics (SIAM).

She is currently a member of the SIAM Board of Trustees, Section Editor for the Software and High Performance Computing section for the “SIAM Journal on Scientific Computing”, and Associate Editor for the “SIAM Journal on Matrix Analysis and Applications.”

Comments

  1. Mark L. Stone says

    500^3 * 100^2 * 4 = 5 TB, so apparently, the data is only (to be stored as) single precision? Is double precision useless, or is something lost by saving in single precision? Extra precision is irrelevant when compressed by Tucker tensor decomposition?