IBM’s Plan to bring Machine Learning Capabilities to Data Scientists Everywhere

Over at the IBM Blog, IBM Fellow Hillary Hunter writes that the company anticipates that the world’s volume of digital data will exceed 44 zettabytes, an astounding number. “IBM has worked to build the industry’s most complete data science platform. Integrated with NVIDIA GPUs and software designed specifically for AI and the most data-intensive workloads, IBM has infused AI into offerings that clients can access regardless of their deployment model. Today, we take the next step in that journey in announcing the next evolution of our collaboration with NVIDIA. We plan to leverage their new data science toolkit, RAPIDS, across our portfolio so that our clients can enhance the performance of machine learning and data analytics.”

Big 3 Cloud Providers join with NSF to Support Data Science

“NSF’s participation with major cloud providers is an innovative approach to combining resources to better support data science research,” said Jim Kurose, assistant director of NSF for Computer and Information Science and Engineering (CISE). “This type of collaboration enables fundamental research and spurs technology development and economic growth in areas of mutual interest to the participants, driving innovation for the long-term benefit of our nation.”

RCE-Podcast Looks at Project Jupyter for Interactive Data Science

In this RCE Podcast, Brock Palen and Jeff Squyres discuss Jupyter with Dr. Brian Granger from Cal Poly State University. “Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.”

Video: Revolution in Computer and Data-enabled Science and Engineering

Ed Seidel from the University of Illinois gave this talk at the 2017 Argonne Training Program on Extreme-Scale Computing. The theme of his talk centers around the need for interdisciplinary research. “Interdisciplinary research (IDR) is a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice.”

Argonne’s Data Science Program Doubles Down with New Projects

Today Argonne announced that the ALCF Data Science Program (ADSP) has awarded computing time to four new projects, bringing the total number of ADSP projects for 2017-2018 to eight. All four of the program’s inaugural projects were also renewed. “The new project award recipients include an industry-based deep learning project; a national laboratory-based cosmology workflow project; and two university-based projects: one that uses machine-learning for materials discovery, and a deep-learning computer science project.”

IBM Moves Data Science Forward with Integrated Analytics System

Today IBM announced the Integrated Analytics System, a new unified data system designed to give users fast, easy access to advanced data science capabilities and the ability to work with their data across private, public or hybrid cloud environments. “Today’s announcement is a continuation of our aggressive strategy to make data science and machine learning more accessible than ever before and to help organizations like AMC, begin harvesting their massive data volumes – across infrastructures – for insight and intelligence,” said Rob Thomas, General Manager, IBM Analytics.

NSF Announces $17.7 Million Funding for Data Science Projects

Today the National Science Foundation (NSF) announced $17.7 million in funding for 12 Transdisciplinary Research in Principles of Data Science (TRIPODS) projects, which will bring together the statistics, mathematics and theoretical computer science communities to develop the foundations of data science. Conducted at 14 institutions in 11 states, these projects will promote long-term research and training activities in data science that transcend disciplinary boundaries. “Data is accelerating the pace of scientific discovery and innovation,” said Jim Kurose, NSF assistant director for Computer and Information Science and Engineering (CISE). “These new TRIPODS projects will help build the theoretical foundations of data science that will enable continued data-driven discovery and breakthroughs across all fields of science and engineering.”

Lorena Barba Presents: Data Science for All

“In this new world, every citizen needs data science literacy. UC Berkeley is leading the way on broad curricular immersion with data science, and other universities will soon follow suit. The definitive data science curriculum has not been written, but the guiding principles are computational thinking, statistical inference, and making decisions based on data. “Bootcamp” courses don’t take this approach, focusing mostly on technical skills (programming, visualization, using packages). At many computer science departments, on the other hand, machine-learning courses with multiple pre-requisites are only accessible to majors. The key of Berkeley’s model is that it truly aims to be “Data Science for All.”

Introduction to Data Science with Spark

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

EPA Joins National Consortium for Data Science

“The work we do involves capturing and analyzing huge environmental data sets so that the government can make informed policy decisions that protect humans and the environment,” said Ron Hines, Associate Director for Health at the EPA’s National Health and Environmental Effects Research Laboratory in Research Triangle Park, N.C. “We have collaborated with the NCDS on some of its initiatives in the past and having a seat at its leadership table will help us connect with leading data researchers, access data resources and infrastructure, and contribute to the development of future NCDS strategies.”