A new grant will allow researchers at St. Olaf to move data around at “Big Data” speeds.
The digital revolution has swept the globe and brought with it the Age of Information, where data is more valuable than gold. The $327,640 grant from the National Science Foundation will go to updating St. Olaf’s cyber infrastructure. This will allow St. Olaf to remain at the cutting edge of Big Data research.
Big Data capabilities have helped companies like Facebook and Google become technology giants. This allows them to accrue tremendous amounts of data about their users. Google Analytics, a popular tool used by Web masters to track views on their websites, not only records how many people logged onto the website, but also their location, age, gender and even their interests.
The amazing thing is that Google knows these facts about Internet users, even though they are never explicitly mentioned. It’s all purely based on what users do on the Web, what users search for, how much time they spend on Web sites and what they click on. Imagine keeping track of all this information, all the time, for every single user on the Internet.
That is what the term “Big Data” refers to. It is data so large that it exceeds the ability for humans to understand, analyze or transfer using traditional computing and storage methods.
St. Olaf Professor of Statistics Julie Legler and Director of Information Systems Craig Rice are in charge of putting this grant to good use.
“St. Olaf students will be getting incredible experience in Big Data analysis,” Legler said. “There are not a lot of liberal arts schools with this data capacity.”
The upgrades from this grant will increase data transfer speeds tenfold. Transfer speed is very important because working with immense databases requires specialized hardware and powerful machines to process the data. This enables researchers to transfer this data from their personal computers to the powerful cluster computers for analysis and read back the results, a process that would otherwise take days.
Students are already working on projects that will benefit from this powerful new upgrade. A joint Medical Economics project led by faculty members Ashley Hodgson and Jessica Musselman, with help from students in the computer science department, aims to comb through millions of patients’ data to find patterns of occurrences of diseases and their respective treatments.
The hospital and patient data had to be transferred on many CDs and shipped across the country to get to St Olaf.
“In this case, it was transferred like that because of the confidentiality terms of the data, but in the future, if we wanted access to a large data set like that, we could just beam it over with this new network,” said Richard Brown, a professor of computer science. “Once we get on this new network, we’ll be connected to every other institution and university using the same network. We could have a joint research project with Macalester or beam data over from Stanford and do our tests.”
The newly-installed “Infiniband” wires in the computer clusters allow data transfer speeds that reach four times the tenfold increase with this new network a 40 times increase over the traditional network on campus, but only within the cluster.
“This means that a computer in our cluster can access a file on another computer faster than it can access a file on its own hard drive,” said Professor Brown.
Brown added that it is very affirming to know that students are learning to work with the very same technologies and techniques that the biggest companies use to handle billions of data points.
“[The students] used a framework called Hadoop to analyze the [hospital] data. That’s the very same framework Facebook uses to manage its billion users on the petabyte scale,” Brown said.
Brown has seen firsthand the enormous impact these Big Data techniques can have on research. When Associate Professor of Biology and Environmental Studies John Schade was researching the effects of a certain species of plants on the nitrogen cycle in the environment, it took a lot of time and effort to create a suitable model for his system. Due to its extremely complex nature, however, the computations he could carry out on the system were very limited.
“He was only able to run maybe a dozen or so simulations. He gave it to [the computer science department], and we ran a few million,” Brown said. This resulted in new findings that would not have been possible without the work of the computer science department.
Big Data techniques are employed even in the humanities. When Doug Casson, Associate Professor of Political Science, was studying the works of philosopher John Locke and how much of his writings were pulled from the Bible, he sought assistance from the computer science department to comb through all of Locke’s works and match them against the Bible.
“We realized that once we had the base program working, it was very easy to apply Big Data techniques to it to have it match any body of work to every other known work in our library’s database,” Brown said.
It is exciting to see what other benefits will emerge from this new data transfer upgrade, as more and more fields and departments make use of Computer Science to solve their problems and make innovative breakthroughs.
Graphic Credit: ETHAN BOOTE/MANITOU MESSENGER