Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


RCE-Podcast Looks at Project Jupyter for Interactive Data Science

In this RCE Podcast, Brock Palen and Jeff Squyres discuss Jupyter with Dr. Brian Granger from Cal Poly State University. “Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.”

Video: Revolution in Computer and Data-enabled Science and Engineering

Ed Seidel from the University of Illinois gave this talk at the 2017 Argonne Training Program on Extreme-Scale Computing. The theme of his talk centers around the need for interdisciplinary research. “Interdisciplinary research (IDR) is a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice.”

Argonne’s Data Science Program Doubles Down with New Projects

Today Argonne announced that the ALCF Data Science Program (ADSP) has awarded computing time to four new projects, bringing the total number of ADSP projects for 2017-2018 to eight. All four of the program’s inaugural projects were also renewed. “The new project award recipients include an industry-based deep learning project; a national laboratory-based cosmology workflow project; and two university-based projects: one that uses machine-learning for materials discovery, and a deep-learning computer science project.”

IBM Moves Data Science Forward with Integrated Analytics System

Today IBM announced the Integrated Analytics System, a new unified data system designed to give users fast, easy access to advanced data science capabilities and the ability to work with their data across private, public or hybrid cloud environments. “Today’s announcement is a continuation of our aggressive strategy to make data science and machine learning more accessible than ever before and to help organizations like AMC, begin harvesting their massive data volumes – across infrastructures – for insight and intelligence,” said Rob Thomas, General Manager, IBM Analytics.

NSF Announces $17.7 Million Funding for Data Science Projects

Today the National Science Foundation (NSF) announced $17.7 million in funding for 12 Transdisciplinary Research in Principles of Data Science (TRIPODS) projects, which will bring together the statistics, mathematics and theoretical computer science communities to develop the foundations of data science. Conducted at 14 institutions in 11 states, these projects will promote long-term research and training activities in data science that transcend disciplinary boundaries. “Data is accelerating the pace of scientific discovery and innovation,” said Jim Kurose, NSF assistant director for Computer and Information Science and Engineering (CISE). “These new TRIPODS projects will help build the theoretical foundations of data science that will enable continued data-driven discovery and breakthroughs across all fields of science and engineering.”

Lorena Barba Presents: Data Science for All

“In this new world, every citizen needs data science literacy. UC Berkeley is leading the way on broad curricular immersion with data science, and other universities will soon follow suit. The definitive data science curriculum has not been written, but the guiding principles are computational thinking, statistical inference, and making decisions based on data. “Bootcamp” courses don’t take this approach, focusing mostly on technical skills (programming, visualization, using packages). At many computer science departments, on the other hand, machine-learning courses with multiple pre-requisites are only accessible to majors. The key of Berkeley’s model is that it truly aims to be “Data Science for All.”

Introduction to Data Science with Spark

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

EPA Joins National Consortium for Data Science

“The work we do involves capturing and analyzing huge environmental data sets so that the government can make informed policy decisions that protect humans and the environment,” said Ron Hines, Associate Director for Health at the EPA’s National Health and Environmental Effects Research Laboratory in Research Triangle Park, N.C. “We have collaborated with the NCDS on some of its initiatives in the past and having a seat at its leadership table will help us connect with leading data researchers, access data resources and infrastructure, and contribute to the development of future NCDS strategies.”

Video: A Computational Future for Science Education

“This talk will describe one new effort to embed best practices for reproducible scientific computing into traditional university curriculum. In particular, a set of open source, liberally licensed, IPython (now Jupyter) notebooks are being developed and tested to accompany a book “Effective Computation in Physics.” These interactive lecture materials lay out in-class exercises for a project-driven upper-level undergraduate course and are accordingly intended to be forked, modified and reused by professors across universities and disciplines.”

Video: Democratizing Data Science

“CDSW’s organizers are professional programmers and data scientists and several of us have experience teaching data science in more traditional university and corporate settings. Our talk will describe how “democratized” data science is similar to — and sometimes extremely different from — these more traditional approaches. We will talk about some of the challenges we have faced and highlight some of our most inspirational successes.”