Abstract:
The past decade has witnessed an extraordinary effort of Artificial Intelligence in the field of machine learning dedicated to guiding material discovery [1], comprehensive data analysis searching for unknown order parameters through multidimensional 'big data' [2], and many more [3]. Recently, a collaboration between the research groups of Cornell Physics Prof. Eun-Ah Kim, QM2, and researchers in the Materials Science Division at Argonne National Lab demonstrated and deployed an unsupervised machine learning approach (XTEC) that extracts order parameters, detects subtle intra-unit-cell order, and maps the temperature and doping dependent phase diagrams of quantum materials [2].
In this project, the student will work to integrate this pre-existing community-driven ML algorithm into the iMachine learning as a Service [4] framework, which is a cloud-based system offering machine learning tools, algorithms, and models, with fast access to large QM2 user datasets the X-ray community needs to create label data. MLHub@CHEXS infrastructure will provide different reference datasets and pre-built ML algorithms. In addition, it will allow the developers, data scientists, and domain scientists to use API-driven common workflows to access a uniform interface, protocol, and data format to analyze QM2 Big data.
In this project, the student will learn about the basics of ScikitLearn, matplotlib, seaborn, and Flask framework. In the first phase, the student will convert existing Jupyter notebooks into stand-alone Python programs (separation of ML parts into training and inference parts). Then, write an inference server and provide scripts to work with it. The second part of the project will be focused on the integration of newly created inference servers into the CHESS MLHub infrastructure. A CS background student with Python programming and Jupyter notebook, basic knowledge of UNIX/Linux environment, and HTTP protocol experience is preferred.
References:
1) Kusne, A.G., Yu, H., Wu, C. et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat Commun 11, 5966 (2020).
2) Venderley, Jordan, et al. "Harnessing interpretable and unsupervised machine learning to address big data from modern X-ray diffraction." Proceedings of the National Academy of Sciences 119.24 e2109665119 (2022)
3) McDannald, Austin, et al. "ANDiE the Autonomous Neutron Diffraction Explorer." Neutron News 34.2 (2023).
4) Kuznetsov, Valentin, Luca Giommi, and Daniele Bonacorsi. "Mlaas4hep: machine learning as a service for hep." Computing and Software for Big Science 5 1-16 (2021)