Middleware and High Performance Analytics Libraries for Scalable Data Science
The National Science Foundation (NSF) announced awards to develop tools, cyberinfrastructure and best practices for data science in their press release Laying the groundwork for data-driven science. Under the Data Infrastructure Building Blocks (DIBBs) program, innovative projects are funded that develop building blocks essential for advancing scientific discovery through data.
One of only two US$ 5M early implementation grants will support the SPIDAL project, a research team led by Geoffrey Fox in the School of Informatics and Computing at Indiana University. The SPIDAL project (with its full title Middleware and High Performance Analytics Libraries for Scalable Data Science) aims to create middleware and analytics libraries to allow data science to work at large scale on high-performance computing systems (also known as supercomputers). Oliver Beckstein’s research group in the Center for Biological Physics is a partner on the proposal. They will use the libraries developed by the partners to analyze data-intensive biomolecular simulations. Algorithms for efficiently analyzing large simulations will be implemented as part of the MDAnalysis package. By bringing “BigData” tools to the computational biophysics community they aim to answer questions on how proteins function in healthy conditions and how they malfunction in diseases, ultimately leading to physics-inspired answers on how to improve human health.
Other institutions collaborating on the project include Emory University, Rutgers University, University of Kansas, University of Utah and Virginia Tech.
Discuss: “Middleware and High Performance Analytics Libraries for Scalable Data Science”