Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Lorena Barba Presents: Data Science for All

“In this new world, every citizen needs data science literacy. UC Berkeley is leading the way on broad curricular immersion with data science, and other universities will soon follow suit. The definitive data science curriculum has not been written, but the guiding principles are computational thinking, statistical inference, and making decisions based on data. “Bootcamp” courses don’t take this approach, focusing mostly on technical skills (programming, visualization, using packages). At many computer science departments, on the other hand, machine-learning courses with multiple pre-requisites are only accessible to majors. The key of Berkeley’s model is that it truly aims to be “Data Science for All.”

Introduction to Data Science with Spark

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

EPA Joins National Consortium for Data Science

“The work we do involves capturing and analyzing huge environmental data sets so that the government can make informed policy decisions that protect humans and the environment,” said Ron Hines, Associate Director for Health at the EPA’s National Health and Environmental Effects Research Laboratory in Research Triangle Park, N.C. “We have collaborated with the NCDS on some of its initiatives in the past and having a seat at its leadership table will help us connect with leading data researchers, access data resources and infrastructure, and contribute to the development of future NCDS strategies.”

Video: A Computational Future for Science Education

“This talk will describe one new effort to embed best practices for reproducible scientific computing into traditional university curriculum. In particular, a set of open source, liberally licensed, IPython (now Jupyter) notebooks are being developed and tested to accompany a book “Effective Computation in Physics.” These interactive lecture materials lay out in-class exercises for a project-driven upper-level undergraduate course and are accordingly intended to be forked, modified and reused by professors across universities and disciplines.”

Video: Democratizing Data Science

“CDSW’s organizers are professional programmers and data scientists and several of us have experience teaching data science in more traditional university and corporate settings. Our talk will describe how “democratized” data science is similar to — and sometimes extremely different from — these more traditional approaches. We will talk about some of the challenges we have faced and highlight some of our most inspirational successes.”