“Modern Cosmology and Plasma Physics codes are capable of simulating trillions of particles on petascale systems. Each time step generated from such simulations is on the order of 10s of TBs. Summarizing and analyzing raw particle data is challenging, and scientists often focus on density structures for follow-up analysis. We develop a highly scalable version of the clustering algorithm DBSCAN and apply it to the largest particle simulation datasets. Our system, called BD-CATDS, is the first one to perform end-to-end clustering analysis of trillion particle simulation output. We demonstrate clustering analysis of a 1.4 Trillion particle dataset from a plasma physics simulation, and a 10,240^3 particle cosmology simulation utilizing ~100,000 cores in 30 minutes. BD-CATS has enabled scientists to ask novel questions about acceleration mechanisms in particle physics, and has demonstrated qualitatively superior results in cosmology. Clustering is an example of one scientific data analytics problem. This talk will conclude with a broad overview of other leading data analytics challenges across scientific domains, and joint efforts between NERSC and Intel Research to tackle some of these challenges.”