Introduction to Data Science with Spark

Antonietta Mira

In this video, Antonietta Mira from the InterDisciplinary Institute of Data Science presents: Introduction to Data Science with Spark.

The Data Science with Spark Workshop addresses high-level parallelization for data analytics workloads using the Apache Spark framework. Learning objectives include:

  • Understand the value of parallelization
  • Understand the value of a high-level framework like Apache Spark
  • Understand the MapReduce paradigm, which is central to Spark
  • Get hands-on experience in applying the MapReduce paradigm for various applications, ranging from statistical analysis to machine learning

Additionally, participants will learn how to prototype with Spark and how to exploit large HPC machines like the Piz Daint CSCS flagship system.

Antonietta Mira is professor of statistics, co-founder and co-director of the InterDisciplinary Institute of Data Science, at USI where she served as the Vice-Dean in the Faculty of Economics (2013-2015). She is also part-time professor of statistics at Università dell’Insubria, is a fellow of the Istituto Lombardo Accademia di Scienze e Lettere, a fellow of the International Society for Bayesian Analysis (ISBA), a visiting fellow of the Isaac Newton Institute for Mathematical Sciences at Cambridge University (2014 and 2016) and has been a visiting professor at Université Paris-Dauphine, University of Western Australia, Queensland University of Technology, Brisbane, and University of Bristol, UK. She has won awards for excellence in both research and teaching. She is the principal investigator on several projects at the Swiss National Science Foundation and a member of multiple scientific committees representing her areas of expertise: Bayesian statistical models and efficient Monte Carlo simulation algorithms and theory. Her current research focuses on data science and methodological and computational statistics, both of which have a clear interdisciplinary scope across social science, finance, economics and industry. She is often invited to talk at international scientific conferences where she also organizes sessions on topics related to her research interests.

Mira serves on the editorial board of high impact scientific journals such as Statistica Sinica (2005-8), Journal of Computational and Graphical Statistics (2006-8), Bayesian Analysis (2008-16) and as guest editor of special issues (2014-15-16). She has been involved in public engagement (such as EXPO Milano 2015), has delivered public lectures (Festival of the Swiss Academy of Sciences 200 year anniversary 2015; opening lecture of the USI academic year 2011-12; Istituto Lombardo Accademia di Scienze e Lettere, Milano, 2012 and 2016), and is the scientific lead for the exhibit Numbed by Numbers! She is often interviewed in the media on topics related to Data Science and Big Data. Within IDIDS she organizes a series of public lectures (Data and Society: Opportunities and Fears, 2015-16) and scientific seminars (Directions in Data Science, 2015-16). Antonietta holds a PhD in Computational Statistics (1998) and a Master’s in Statistics (1996) from the University of Minnesota in Minneapolis, US. She also has a Doctorate in Methodological Statistics from the University of Trento (1995), Italy, and earned her Bachelor’s in Economics, summa cum laude, from the University of Pavia, Italy. Her work has been published in over 50 scientific articles and books.

Sign up for our insideHPC Newsletter