Illinois researchers build Dropbox-like storage, analytical system for scientific data

As an Illinois graduate student who specializes in fabricating new semiconductor lasers, Tommy O’Brien is like a chef, making new wafers out of his own special recipe of materials, temperature, and baking time. But unlike the technology he’s designing, his method for preserving and analyzing his recipe is anything but cutting edge: he copies each recipe into a notebook by hand.


“I basically have to write down everything step by step, and it’s incredibly time-consuming if I have to go back and search for information later,” said O’Brien, who is pursuing his doctorate in electrical and computer engineering. “It’s equally cumbersome to share information about successes or failures with my lab mates. It often doesn’t happen at all.”

 

Studies suggest that it can take up to 20 years from the discovery of new materials to the fabrication of new and next-generation devices based on these materials. The delay is due in part to the time it takes to conduct research, thanks to slow data processes, and to the loss of knowledge that occurs when vital information is tossed out or inaccessible to researchers who seek to build upon earlier work.

Researchers at the University of Illinois at Urbana-Champaign are looking to speed up the materials-to-device process through a novel framework called “4CeeD: Real-Time Data Acquisition and Analysis Framework for Material-related Cyber-Physical Environments.” 4CeeD connects microscopes and other scientific instruments to a cloud infrastructure through a high-speed University of Illinois campus network. The interface works much like Dropbox – with easy drag-and-drop uploading – but offers much more advanced data management, annotation, and analytics capabitilies, along with a higher level of semantic understanding.


Klara Nahrstedt
Klara Nahrstedt

“We have developed a cloud architecture that makes it easy for scientists to not only upload their data, but also curate and manage the data, as well as get real-time search results,” said Principal Investigator Klara Nahrstedt, the Ralph and Catherine Fisher Professor of Computer Science and director of the Coordinated Science Laboratory. “4CeeD enables researchers to search for experiments with specific parameters and receive insights into their own work.”

 

After researchers conduct an experiment using a scientific instrument, such as a microscope, they can upload the data, including image files, to the cloud. 4CeeD allows the researchers to tag the files with metadata, which helps researchers search for information about the experiment later. In addition, it enables faculty members or senior graduate students to create a template that junior students can use when annotating files for similar experiments, saving time and ensuring the correct data is captured.

Patrick Su, a first-year electrical and computer engineering graduate student, recently used 4CeeD to help troubleshoot an “overbaked” polymer.

“I typed in my parameters and got information that helped me see where I went wrong,” said Su, who, along with O’Brien, is advised by ECE Associate Professor John Dallesasse. “In the past, I would have had to call a vendor rep to figure out what’s wrong for something that turned out to be a simple error. The best thing about it is that it allows me to easily access all my information that was previously hard to get to.”

The cloud is private and secure, and has strong access controls, says Nahrstedt, giving groups complete authority over who sees their data. In addition, the research team is creating a mirror storage system to keep back-up data in the event one server fails. The system will eventually be maintained by the College of Engineering’s information technology group.

4CeeD recently won Best Paper at the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. The paper’s authors include Phuong Nguyen, a PhD student in computer science, and programmers Steve Konstanty and Todd Nicholson, along with other researchers from the Coordinated Science Laboratory, the Micro and Nanotechnology LaboratoryMaterials Research Laboratory, and the National Center for Supercomputing Applications. The work is funded through a Data Infrastructure Building Blocks grant sponsored by the National Science Foundation.

Nahrstedt hopes to extend 4CeeD’s research to other universities and labs in the future. The software is already available on GitHub and is being stress-tested by the National Institute of Standards and Technology (NIST).

“The groups are telling us they are able to conduct experiments 30% more quickly, which translates into at least $30,000 in savings per student over his/her PhD time when it comes to lab time,” Nahrstedt said. “As a result, it’s a much more efficient research environment.”

 

Editor's Note: 
The paper’s authors are: 
Bioengineering: Aaron Schwartz-Duval
Computer Science: Phuong Nguyen, Klara Nahrstedt, Roy Campbell, Indranil Gupta
CSL: Klara Nahrstedt, Steven Konstanty, Todd Nicholson, Normand Paquin
Electrical and Computer Engineering: Thomas O'Brien 
MRL: Timothy Spila
MNTL: Aaron Schwartz-Duval and Thomas O'Brien
Engineering IT: Michael Chan
NCSA: Kenton McHenry
 
 
Principal Investigators of the overall project are:
Klara Nahrstedt, CS and CSL (PI)
Paul Braun - MRL (Materials Science Department)
Brian Cunningham - MNTL (ECE/Bioengineering Departments)
David Nicol - ITI (ECE Department)
Narayan Aluru - CSE (Mechanical Engineering Department)