Center for Nonlinear Studies

Wednesday, September 06, 201711:00 AM - 12:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Seminar
A Roadmap to Data-Driven Discovery and Rational Design in Chemical and Materials Research
Johannes HachmannUniversity at Buffalo
Trial-and-error approaches are increasingly ill equipped to meeting the complex challenges involved in the discovery and design of next-generation chemistry and materials. Our work recognizes the great opportunities that are arising with the shift towards data-driven in silico research and a rational design paradigm. These approaches are poised to mitigate the inefficiencies, shortcomings, and limitations of traditional trial-and-error research. However, the notion to utilize modern data science in the chemistry context is so recent that much of the basic infrastructure has not yet been developed, or is still in its infancy. The existing tools and expertise tend to be in-house, specialized, or otherwise unavailable to the community at large. Data science is thus in practice beyond the scope and reach of most researchers in the field. Our work aims to chart new paths in this area by creating an open, general-purpose software ecosystem designed to overcoming this situation, filling the prevalent infrastructure gap, and thus making data-driven research a viable and widely accessible proposition. Our software ecosystem fuses in silico modeling (in particular computational quantum chemistry), high-throughput screening techniques, and Big Data analytics into an integrated research infrastructure. We have been developing the necessary methods, algorithms, protocols, and codes, and assembled them in three loosely connected program suites: ChemHTPS provides an automated platform for the virtual high-throughput screening of compound and material candidate libraries as well as reaction networks; ChemBDDB offers a database and data model template for the massive information volumes created by data-intensive projects; and ChemML is a machine learning and informatics toolbox for the validation, analysis, mining, and modeling of such data sets.

Host: Bryan Moore