Big data is such a hot topic it has finally outgrown the descriptor ’big’. From scientific journals to the popular press, so much has been said about big data and the challenges and opportunities it presents, that sorting through the data on big data has itself become a challenge.
Discussing the issues is a good start, but action is even better.
In 2013, a handful of academic researchers and business professionals, located mainly in the Research Triangle Park area of North Carolina, put their heads together to develop a strategy for action and practical projects that used data for maximum impact in science, business, and education. That effort is now known as the National Consortium for Data Science, or NCDS.
Back in 2012 and early 2013, I was one of the main proponents of the NCDS. I spent many hours talking to people who create data in their work; those who use it to develop products, conduct research, and understand their customers; and the technology experts who build tools for collecting, sharing, analyzing, and managing data. My message was simple: making the most of our data-rich world will require more than focused, domain-specific research projects and isolated product development efforts. Barriers that separate science domains and isolate the worlds of research, business, and government must be toppled. Taming the data deluge and gleaning real knowledge from data must be a broad-based and strategic effort, and must address everything from education of data specialists to figuring out ways to quickly translate data research into breakthrough products and services.
That message rang true for many people who view data challenges from very different perspectives. As a result, today the NCDS is an active and growing organisation whose members include US research universities (including three University of North Carolina campuses, Drexel University, Texas A & M, and UNC General Administration), major corporations (Cisco, Deloitte, EMC, GE, and IBM), and government agencies and non-profit organizations (RTI International, MCNC, and the US Environmental Protection Agency).
Our success probably has something to do with our impatience. We didn’t have all the answers for harnessing big data, but we knew action was essential or the data juggernaut would steamroll right over us. So the NCDS — led by a dedicated team at the University of North Carolina Chapel Hill’s Renaissance Computing Institute (RENCI) — opted for action. In that first year, it was learn-by-doing in the extreme. Not everything turned out exactly as we expected, but we did successfully stand up an organization consisting of diverse members with different agendas. We also came to understand how the NCDS could make the most impact on important data challenges. Through hard work and development of programs and events aimed at members, students, the data workforce, and data researchers, we learned the following:
Finding solutions to the challenges of data sharing, analysis, management, and long-term curation requires recognizing data science as a science on par with any other domain. The NCDS defines data science as the systematic study of the flow, curation, and analysis of digital data to enable research discoveries, decision-making, and a data-driven economy.
And the need for data scientists is acute; as Google chief economist Hal Varian said back in 2009: The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.
The McKinsey Global Institute estimates that by 2018, the US alone could face a shortage of 140,000 to 190,000 people with deep analytical skills. But despite a recent proliferation of advanced degree programs in data science and analytics, there are as yet no standards for data science curricula, and whether these programs will meet the needs of data specialists in the commercial world is still unclear. That’s why the NCDS sponsors events that bring talented students in analytics, information sciences, and data-driven domain sciences together with representatives of the organizations who need them. It’s why we’ve formed teams consisting of faculty and corporate NCDS members to develop plans for data science curricula and why we’re working to build a data observatory that will give students the chance to work with real – and very large – data sets.
Data challenges are ubiquitous and universal, which means solutions must break down barriers among scientific domains, and between the public and private sectors. It’s relatively simple for corporate sponsors to fund university researchers interested in investigating data science questions. It’s tougher to bring together university researchers, software and hardware specialists, and professionals in multiple business sectors to work side by side on broad-based efforts to translate data into knowledge and products that enable a better quality of life. That kind of work requires bridging cultural barriers, finding common ground, learning new lexicons, adapting, and sometimes, compromising.
Within the NCDS, our working groups all include members from industry, academia, and the non-profit sector. Our Data Fellows are early career faculty who address interesting data science questions and also want to translate their work into commercial settings. This kind of collaboration is still relatively rare, but essential to finding big data solutions that will lead to personalized healthcare, informed product development, and real-time decision-making based on the latest data.
For the NCDS, the last two years have been busy, sometimes frustrating, always interesting, and often inspiring. We don’t claim to have all the answers to daunting data management problems, nor have we figured out how to ensure the world will have the data-literate workforce it will need for the future.
But we have learned that data solutions must be confronted by people from all backgrounds working together, and we believe we have created a framework for those critical, action-oriented collaborations.
Dr Stanley Ahalt is director of the Renaissance Computing Institute (RENCI), professor of computer science at UNC-Chapel Hill, and chair of the steering committee for the National Consortium for Data Science (NCDS).