The amount of raw data in the world is expanding exponentially. Not just words and numbers in databases, but YouTube videos, MP3 audio files and metadata—data about the data itself. The world’s data pile has now reached the zettabyte scale (trillions of gigabytes), and there are no signs of stopping.
The challenge for researchers and the world isn’t simply finding a place to store all this stuff, says Richard Fujimoto, Regents professor in the School of Computational Science and Engineering and interim director of the Georgia Tech Institute for Data and High Performance Computing. Now that everything is documented, new insights are buried within the morass of data, waiting to be discovered. But changing the world through big data requires having enough computing power to process the information, smart algorithms and data visualization techniques to find patterns among the noise, and a knack for asking the right questions—all Georgia Tech specialties.
“We’ve been interested for quite some time, well before people started calling this big data,” Fujimoto says. “You need to put together teams of scientists and engineers and computer science professionals to attack these problems.”
Georgia Tech now finds itself at the forefront of a groundswell of interest in massive sets of information, with Tech researchers using big data to make warehouse deliveries more efficient, build better climate models, thwart computer malware, pinpoint possible cancer warning signs and more.
Consider the explosive growth of social media. Facebook and Twitter are of special interest to David Bader, a professor in the School of Computational Science and Engineering and executive director of the new Center for High Performance Computing. He imagines Facebook as a mass of millions of points (representing the users) and connection lines (representing friendships or shared content); his team has devised algorithms to identify the most important actor in a network—not necessarily the one with the most friends, but the one who yields the most influence. The team’s work is about more than figuring out who makes a particular news story popular on Facebook. “Those same algorithms can work in other domains,” Bader says. “That type of [approach] has been applied to how HIV is transmitted within jail populations, to how jazz musicians find the most influential musicians, and to how transportation networks try to find congestion points.” He’s also looked at the U.S. power grid, and how cascading failures spread.
Big data also plays a significant role in Tech’s new Institute for Materials, which is developing new materials for industry. “The human genome spawned a revolution in the biomedical industry to try out new approaches to developing cures for diseases,” Fujimoto says. “With the materials genome, the idea is to have a similar transformative project focus on the atomic structures of these different materials.” With a smart catalog for engineers to work from, Fujimoto hopes to cut the development time for a new material from prototype to market—currently about 15 years—in half.
Even the next generation of video games could benefit from Tech’s big data research. This year, doctoral student Alexander Zook, under the guidance of principal investigator Mark Riedl, used big data to push dynamic difficulty adjustment in games. “You want to change the game based on the players,” Zook says. “Most games as they exist right now … are a one-size-fits-all experience.” In a small-scale experiment, Zook showed that by gathering data on how a host of gamers handled a particular problem, he could design a game to adjust on the fly, preventing it from becoming too easy or too hard.
In Bader’s estimation, the future of big data—the ability to employ these approaches to solve big, national-scale problems—depends upon researchers’ ability to unify multiple enormous data sets. To analyze how residents fled from the devastating Colorado floods in September, for example, a researcher would need to access weather sensor data, weather forecasts, perhaps satellite imagery, even tweets from those who evacuated. “There are so many aspects that we need to bring together,” Bader says.
Part of the problem is that these data can exist in disparate parts of the cloud. Part of it lies in the challenge of researchers getting together the computing power, smart computing architecture and advanced algorithms to deal with the data.
Whether scientists are trying to revolutionize personalized medicine, bolster national security, or help people evacuate a disaster area, the key to making the world a better place through big data is finding the meaningful connections.
“All of these are problems where we have data—massive data,” he says. “But we don’t know yet how to put it together and make sense of it in near-real time.”