Chemical compound space (CS) is the set of all possible molecules or materials. Therefore, its exploration is a challenging task. On the other hand, coarse-grained (CG) models constitute a suitable tool for exploring CS because such models reduce the possible degrees of freedom by aggregating atoms at interaction centres called beads. The reduction in complexity enables faster and cheaper explorations of CS, as a single bead in a transferable CG model can represent multiple chemical compounds. Unfortunately, a limiting step in the use of CG models for exploring CS is that the inverse operation, known as backmapping (i.e., from one bead to many molecules), is a very challenging task involving a one-to-many mapping in a non-injective framework. A way to overcome this challenge is the use of generative machine learning methodologies, which learn an underlying distribution from training data and generate new samples that are similar to those in the training distribution.
In this talk, we will review our work on using a graph diffusion model to back-map bead representations in the transferable MARTINI CG force field, exploring chemical space. First, we will introduce the theory of generative diffusion models. To then show results for the generation of molecules conditioned on multiple bead types. In contrast to previous work, our model does not assume a predefined correspondence between beads and chemical moieties, as these are inferred from the training distribution. The model can also generate molecules with different numbers of heavy atoms (C, N, O, F) with excellent validity, diversity and uniqueness (>80%) of the generated molecules
 Dr. Luis Itza Vazquez-Salazar