The Berkeley Institute for Data Science: “Everyone is going to have to become a data scientist”
On Thursday, December 12th, 2013, hundreds of Berkeley scientists, students and industry folk gathered to celebrate the opening of the Berkeley Institute for Data Science (BIDS), a multidisciplinary support system that will provide incentives, infrastructure and support for scientists from all departments hoping to jump on the big data wagon. Funded to the tune of 37.8 million by the Alfred P. Sloan Foundation and Gordon and Betty Moore, BIDS represents the future of science, and part of a larger trend to improve technological advances in the United States, as the White House recently pledged $200 million to its own “big data initiative” at its “Data to Knowledge to Action” event last month, where the BIDS partnership was announced.
BIDS is part of a collaborative project with New York University and the University of Washington, and is led by a team that includes Berkeley astrophysicist and Nobel Laureate Saul Perlmutter. The new center at Berkeley will be housed at 190 Doe Library, a national landmark known for its history; it is a space that will now be known for moving UC Berkeley into the future.
What is big data and why do we need an institute for it? Big data really refers to the massive amounts of information that we are now able to obtain due to advances in technology — from smart phones and web clicks to geographic positioning systems to genomic data and beyond! Recognizing the importance of statistical knowledge and computer programming skills to harness this type of data, and the need for people to cross fields and also for academia to stay relevant in a time where industry may lure tech-savvy individuals away, BIDS is in a key position to address the current and upcoming changes to how we think about science.
Nicholas Dirks kicked off the ceremony by pledging that innovation in research was a core focus for him as the new chancellor of UC Berkeley, and that big data will “bring departments and programs together in unprecedented ways.” Vicky Chandler, a geneticist and chief program officer from the Moore Foundation, stated that this project is a bold set of experiments that will change the culture at Berkeley and will revolutionize how science is practiced.
Saul Perlmutter, the new director of BIDS, underscored the importance of bridging the gap for underrepresented populations (important food for thought at a heavily Caucasian and male event) and discovering what is slowing down scientists who are less “data science savvy.” Programming environments should not be an obstacle for scientists, he said, noting, “I don’t think God wrote in C.”
Peter Norvig from Google addressed how technology has increased income inequality. Because half of current jobs will be subject to automation, jobs are disappearing rapidly. “Everyone is going to have to become a data scientist,” he warned, but rather than being replaced by computers, we need to recognize that humans and computers working together can make “the best team of all.”
We also heard from Tim O’Reilly, the founder of O’Reilly media and director of Maker Media and the online journal PeerJ. He discussed the importance of applying lessons from how consumers use technology to science and public policy. In a world where software has become a commodity of less importance, now data has become more valuable. “Tools will make people better at what they do. Others will be left behind.”
Joshua Bloom, one of the members of the BIDS team, emphasized the practical applications of this new center. There will be more than a dozen data science Fellows employed at all levels (staff, post-docs and faculty). The center will offer enhanced trainings for undergrads and graduate students, including bootcamps and hack-a-thons.
During the opening session, we also heard from UC Berkeley scientists Solomon Hsiang, Rosemary Gillespie and Richard Allen on their own work in the context of big data and its possible applications (more on that in part two of this post!). The four-hour event continued with a data faire and a panel discussion moderated by Joshua Bloom with Deborah Agarwal from the Lawrence Berkeley National Laboratory, Cathryn Carson from the D-Lab and Mike Franklin, director of the AMP (Algorithms, Machines, People) Lab at UC Berkeley.
At the data faire, visitors were able to interact directly with scientists, including staff at the D-Lab, graduate students and representatives from iPython. I will tell you more about how Berkeley scientists are using big data in my next blog post!
And as for what will be happening starting in February at the Berkeley Institute for Data Science? As Vicky Chandler from the Moore Foundation declared in her welcoming remarks: “Let the fun begin.”
Want to know more?:
More information about the Berkeley Institute for Data Science:
Read the Berkeley announcement
A great blog post about BIDS from Fernando Perez, creator of iPython and Research Scientist at the Henry H. Wheeler Jr. Brain Imaging Center
The White House’s press release about the Big Data Initiative