Complex chemical processes, such as the decomposition of energetic materials and the chemistry of planetary interiors, are typically studied using large-scale molecular dynamics simulations that run for weeks on high performance parallel machines. These computations may involve thousands of atoms forming hundreds of molecular species and undergoing thousands of reactions. It is natural to wonder whether this wealth of data can be utilized to build more efficient, interpretable, and predictive models. In this talk, we will use techniques from statistical learning to develop a framework for constructing Kinetic Monte Carlo (KMC) models from molecular dynamics data.[1] We will show that our KMC models can not only extrapolate the behavior of the chemical system by as much as an order of magnitude in time, but can also be used to study the dynamics of entirely different chemical trajectories with a high degree of fidelity. Then, we will discuss three different methods for reducing our learned KMC models, including a new and efficient data-driven algorithm using L1-regularization. We demonstrate our framework on a system of high-temperature high-pressure liquid methane, thought to be a major component of gas giant planetary interiors. Finally, we discuss how our L1-regularization based algorithm can also be applied to complex systems of reaction rate equations such as those studied in the combustion community, providing a novel data-driven method for reducing nonlinear dynamical systems.

[1] Q. Yang, C. A. Sing-Long, and E. J. Reed, “L1 Regularization-Based Model Reduction of Complex Chemistry Molecular Dynamics for Statistical Learning of Kinetic Monte Carlo Models,” *MRS Advances*, vol. 1, no. 24, pp. 1767–1772, 2016.