<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Chem.</journal-id>
<journal-title>Frontiers in Chemistry</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Chem.</abbrev-journal-title>
<issn pub-type="epub">2296-2646</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fchem.2020.00162</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Chemistry</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Toward the Prediction of Multi-Spin State Charges of a Heme Model by Random Forest Regression</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhao</surname> <given-names>Wei</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/896842/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Qing</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Huang</surname> <given-names>Xian-Hui</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bie</surname> <given-names>Li-Hua</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Gao</surname> <given-names>Jun</given-names></name>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/222230/overview"/>
</contrib>
</contrib-group>
<aff><institution>Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University</institution>, <addr-line>Wuhan</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yong Wang, Ningbo University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Tong Zhu, East China Normal University, China; Chaoyuan Zhu, National Chiao Tung University, Taiwan</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Li-Hua Bie <email>biebie&#x00040;mail.hzau.edu.cn</email></corresp>
<corresp id="c002">Jun Gao <email>gaojun&#x00040;mail.hzau.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Theoretical and Computational Chemistry, a section of the journal Frontiers in Chemistry</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>31</day>
<month>03</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>8</volume>
<elocation-id>162</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>01</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>02</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Zhao, Li, Huang, Bie and Gao.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Zhao, Li, Huang, Bie and Gao</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>The random forest regression (RFR) model was introduced to predict the multiple spin state charges of a heme model, which is important for the molecular dynamic simulation of the spin crossover phenomenon. In this work, a multiple spin state structure data set with 39,368 structures of the simplified heme&#x02013;oxygen binding model was built from the non-adiabatic dynamic simulation trajectories. The ESP charges of each atom were calculated and used as the real-valued response. The conformational adapted charge model (CAC) of three spin states was constructed by an RFR model using symmetry functions. The results show that our RFR model can effectively predict the on the fly atomic charges with the varying conformations as well as the atomic charge of different spin states in the same conformation, thus achieving the balance of accuracy and efficiency. The average mean absolute error of the predicted charges of each spin state is &#x0003C;0.02 e. The comparison studies on descriptors showed a maximum 0.06 e improvement in prediction of the charge of <italic>Fe</italic><sup>2&#x0002B;</sup> by using 11 manually selected structural parameters. We hope that this model can not only provide variable parameters for developing the force field of the multi-spin state but also facilitate automation, thus enabling large-scale simulations of atomistic systems.</p></abstract>
<kwd-group>
<kwd>spin crossover</kwd>
<kwd>heme model</kwd>
<kwd>force field</kwd>
<kwd>machine learning</kwd>
<kwd>ESP charge</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="2"/>
<equation-count count="7"/>
<ref-count count="57"/>
<page-count count="10"/>
<word-count count="6511"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Coordinated compounds of transition metal ions can exhibit a switching phenomenon under certain conditions related to changes in temperature, pressure, light, or magnetic field; the central metal ion changes the spin states (the so-called high-spin, HS, and low-spin, LS, configurations), which is the spin transition (ST) or spin crossover (SCO) (Bousseksou et al., <xref ref-type="bibr" rid="B9">2011</xref>; Gutlich et al., <xref ref-type="bibr" rid="B23">2013</xref>). Since Cambi et al. first reported the thermally induced change of spin states in 1931, (Cambi and Szeg&#x000F6;, <xref ref-type="bibr" rid="B12">1931</xref>) many more SCO complexes have been synthesized thereafter and have been applied to various domains, including molecular switches, memory elements (Jureschi et al., <xref ref-type="bibr" rid="B35">2014</xref>; Shao et al., <xref ref-type="bibr" rid="B50">2015</xref>), temperature sensors (G&#x000FC;tlich and Goodwin, <xref ref-type="bibr" rid="B24">2004</xref>; Doukov et al., <xref ref-type="bibr" rid="B18">2011</xref>), nanomaterials (Nagl et al., <xref ref-type="bibr" rid="B40">2008</xref>; Hauser, <xref ref-type="bibr" rid="B28">2013</xref>), and so on (Bousseksou et al., <xref ref-type="bibr" rid="B9">2011</xref>; Cong et al., <xref ref-type="bibr" rid="B14">2018</xref>; Yuan et al., <xref ref-type="bibr" rid="B57">2018</xref>; Meyer et al., <xref ref-type="bibr" rid="B39">2019</xref>).</p>
<p>In the switching phenomenon, the change of spin state is accompanied by a switch of electron configurations of the central ions, which often leads to marked changes in the physical and chemical properties of the entire complex (G&#x000FC;tlich and Goodwin, <xref ref-type="bibr" rid="B24">2004</xref>; Habenicht and Prezhdo, <xref ref-type="bibr" rid="B25">2012</xref>; Gutlich et al., <xref ref-type="bibr" rid="B23">2013</xref>). Meanwhile, the reorganization of electrons among atoms and the formation of molecules are complex and multifaceted processes, and their full description is only possible within the boundaries of quantum mechanics (QM) (Bristow et al., <xref ref-type="bibr" rid="B11">2014</xref>; Sanvito, <xref ref-type="bibr" rid="B48">2019</xref>). Density functional theory (DFT) is the most common choice for routine ground-state calculations; however, the number of valence electrons scaled cubically, increasing the computational costs significantly (Engler et al., <xref ref-type="bibr" rid="B20">2019</xref>). It will therefore not be suitable, especially when one needs to sample extended size and time scales.</p>
<p>Molecular dynamics (MD) simulation can handle system sizes of typically 10<sup>7</sup> atoms and above, and this has been used for decades to explore chemical and biochemical problems at an atomic level (Liu et al., <xref ref-type="bibr" rid="B38">2017</xref>; Riniker, <xref ref-type="bibr" rid="B43">2018</xref>). The classical MD predominantly uses simplified atomistic models called force fields (FFs) to describe the exact ground-state potential energy surface (PES) of a system. The bonded parameters are represented in terms of equilibrium bond distances, bond and dihedral angles, force constants, and rotation barriers; the non-bonded interactions are typically described by atom-centered point charges and Lennard-Jones potential (Ivanov et al., <xref ref-type="bibr" rid="B34">2015</xref>) while disregarding the explicit treatment of electronic polarizability (De et al., <xref ref-type="bibr" rid="B17">2018</xref>; Sahoo and Nair, <xref ref-type="bibr" rid="B46">2018</xref>; Heid et al., <xref ref-type="bibr" rid="B29">2019</xref>). It is not capable of capturing a restricted but essential number of chemical features, including spin crossover, wherein the molecular system is required to &#x0201C;hop&#x0201D; from one PES of the initial spin state onto another of the product state.</p>
<p>In order to better understand the effect of molecular properties on their electronic ground or excited states, the potential parameter set needs to be extended by a multi-spin state in which at least two issues should be taken into account. Firstly, the geometric configuration at energy minima of the excited state is different from that of the ground state in most cases. This issue can be fixed by adjust the parameters in bonding terms. For example, Meyer&#x00027;s Group has modified force constants for bond stretching and bending terms according to DFT calculation for atomistic molecular dynamics simulations of the HS and LS states of the <italic>Fe</italic><sup>2&#x0002B;</sup> containing model (Meyer et al., <xref ref-type="bibr" rid="B39">2019</xref>). Secondly, it is well-known that the charge distribution in the excited state is different from the ground state, and it will change with molecular structures; it is important for the force field to provide the charges of two spin states. In this regard, an increasing number of schemes have been proposed in addition to the polarized force field, such as the SSAPs method (Xu et al., <xref ref-type="bibr" rid="B55">2018</xref>).</p>
<p>In recent years, many efforts have been directed to the efficient improvement of force fields. In particular, machine learning combined with molecular simulation has been verified by many groups to be effective to develop force field including inferring charges based on a set of reference molecules (Botu et al., <xref ref-type="bibr" rid="B8">2016</xref>; Chen et al., <xref ref-type="bibr" rid="B13">2018</xref>; Inokuchi et al., <xref ref-type="bibr" rid="B33">2018</xref>; Engler et al., <xref ref-type="bibr" rid="B20">2019</xref>; Hu et al., <xref ref-type="bibr" rid="B30">2019</xref>; Roman et al., <xref ref-type="bibr" rid="B44">2019</xref>; Sanvito, <xref ref-type="bibr" rid="B48">2019</xref>; Unke and Meuwly, <xref ref-type="bibr" rid="B53">2019</xref>; Ye et al., <xref ref-type="bibr" rid="B56">2019</xref>). Among these, the random forest regression (RFR) method has been proven to be feasible for the prediction of atomic charge without expending much effort on parameter tuning or descriptor selection. As a classification and regression tool, the Random Forest algorithm was first introduced by Breiman (<xref ref-type="bibr" rid="B10">2001</xref>), inspired by the earlier work of Amit and Geman (<xref ref-type="bibr" rid="B1">1997</xref>). It uses bootstrap samples of the training data and random feature selection in tree induction. Each tree in the ensemble produces an output according to the molecular descriptors or properties, and outputs from all trees are aggregated to produce the final prediction by average (Breiman, <xref ref-type="bibr" rid="B10">2001</xref>; Cutler et al., <xref ref-type="bibr" rid="B15">2011</xref>). This procedure can reduce overfitting and offer some unique features, including built-in performance assessment and measures of variable importance (Svetnik, <xref ref-type="bibr" rid="B52">2003</xref>; Klusowski, <xref ref-type="bibr" rid="B36">2018</xref>), which make it suitable for quantitative structure-activity relationship (QSAR) tasks (Svetnik, <xref ref-type="bibr" rid="B52">2003</xref>; D Richard et al., <xref ref-type="bibr" rid="B16">2007</xref>; Statnikov et al., <xref ref-type="bibr" rid="B51">2008</xref>; Genuer et al., <xref ref-type="bibr" rid="B22">2010</xref>). For instance, Rai and Bakken (<xref ref-type="bibr" rid="B41">2013</xref>) combined random forest regression with ESP charges from high-level QM calculations to predict the partial atomic charge of H, C, N, O, F, S, and Cl. Building on their work, Bleiziffer et al. (<xref ref-type="bibr" rid="B7">2018</xref>) further presented a conformational robust charge extraction scheme DDEC to predict partial charges and achieved accuracy beyond a HF/6-31G* setup. Our group developed a conformational adaptive charges (CAC) model based on atom type symmetry function (ATSF), which was, in turn, based on the RFR method (Wang and Gao, <xref ref-type="bibr" rid="B54">2020</xref>). These machine learning approaches in tandem with quantum mechanics have many merits in developing flexible and adaptive force fields, including low cost, accuracy, and versatility. Yet, they are mainly used to predict charges on the single potential energy surface of the equilibrium configuration of the molecule. The performances of these method on multi-spin state charges remains unreported.</p>
<p>In our previous work (Liu et al., <xref ref-type="bibr" rid="B38">2017</xref>; Du et al., <xref ref-type="bibr" rid="B19">2018</xref>), the spin-forbidden dioxygen binding dynamics in a simplified heme model were investigated by the non-adiabatic trajectory surface-hopping dynamics, and this involved the coupled singlet, triplet, and quintuplet states. The results revealed that there existed dominant long-lived, kinetically meta-stable states during the dynamics trajectories, and each meta-stable pattern showed a distinct partial charge population. Based on this geometric dependence of the partial charge population on the excited state, we proposed to extend the conformation adapted charge (CAC) model and RFR method to the multi-spin state charges of the heme model. The fixed-point charge in the traditional force field can be modified according to the conformation on the fly, and thus the key to the multi-spin state is transformed into the change of charge in the multi-spin state. We hope that this model can not only provide variable parameters for constructing the force field of the multi-spin state but also facilitate automation, thus enabling large-scale simulations of atomistic systems.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and Methods</title>
<p>In this work, we targeted the simplified heme model (see <xref ref-type="fig" rid="F1">Figure 1</xref>), introduced a random forest regression (RFR) algorithm using Behler-Parrinello symmetry functions as descriptors (Behler et al., <xref ref-type="bibr" rid="B6">2007</xref>; Hagai et al., <xref ref-type="bibr" rid="B26">2010</xref>; Behler, <xref ref-type="bibr" rid="B4">2011a</xref>), performed model training by fitting ESP charges of different spin states, and achieved high-quality predictions. The key steps of the workflow are shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. The samples sufficient molecular conformations were obtained from <italic>ab initio</italic> dynamic trajectories of previous work (Du et al., <xref ref-type="bibr" rid="B19">2018</xref>), which covered a wide range of conformations related to the spin crossover. Different descriptors were then extracted, and the ESP charges of three spin states of each atom in each conformation were calculated using the density function theory method, and together these constitute the initial dataset. After this preprocessing was completed, half of the data were selected randomly as the training set to build the RFR model, and the remaining half of the data were used to test the model&#x00027;s ability to reproduce the atomic partial charge under different spin states and thereby to analyze and assess the performance of the model.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The molecular model of this work. The simplified heme model <inline-formula><mml:math id="M1"><mml:mi>F</mml:mi><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>N</mml:mi><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> complex with <italic>O</italic><sub>2</sub> binding was adopted.</p></caption>
<graphic xlink:href="fchem-08-00162-g0001.tif"/>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The work flow of construction data set and prediction process of the RFR model.</p></caption>
<graphic xlink:href="fchem-08-00162-g0002.tif"/>
</fig>
<sec>
<title>2.1. Data Set Preparing</title>
<p>A total of 33 stable trajectories of open-shell singlet state were selected from a non-adiabatic trajectory surface-hopping dynamics simulation from our previous work. The B3LYP/6-31G* level of the method (Reiher et al., <xref ref-type="bibr" rid="B42">2001</xref>; Salomon et al., <xref ref-type="bibr" rid="B47">2002</xref>) was used to calculate the ESP atomic charge of each structure in the singlet, triplet, and quintuplet state. We finally achieved 39,368 converged structures owing to the convergence of the calculation. The data preparation was time consuming. By and large, it took 2 weeks to complete all the calculations of the 39,368 structures for each spin state with four computer nodes; each node had dual Intel 2683v3 CPUs. All the electronic structure calculations were implemented with a Gaussian 16 package (Frisch et al., <xref ref-type="bibr" rid="B21">2016</xref>), and the detail charge distribution of each atom in the different spin states were analyzed and shown in the section 3.</p>
</sec>
<sec>
<title>2.2. Random Forest Regression Model Training</title>
<p>The raw dataset was preprocessed firstly to extract appropriate features, such as the descriptors of structures and input of model. Specifically, each RFR model was constructed separately under certain spin states for each atom according to the flow shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. Since there were 14 atoms in the simplified heme model, 14 independent RFR models were constructed by training for each spin state. There were 42 models in total.</p>
<p>Let <italic>D</italic> &#x0003D; {(<italic>x</italic><sub>1</sub>, <italic>y</italic><sub>1</sub>), &#x022EF;&#x022EF;, (<italic>x</italic><sub><italic>N</italic></sub>, <italic>y</italic><sub><italic>N</italic></sub>)} denote the training data, with <italic>N</italic> &#x0003D; 39368/2, <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>representing the information relative to atom <italic>i</italic> in each structure described with <italic>p</italic> features, and <italic>y</italic><sub><italic>i</italic></sub> denoting the ESP charge. During the training process, for each decision tree in the forest, a bootstrap sample <italic>D</italic><sub><italic>j</italic></sub> from the training data of <italic>N</italic> molecules was drawn first. Starting with all observations (<italic>x</italic><sub>1</sub>, <italic>y</italic><sub>1</sub>)&#x022EF;&#x022EF;(<italic>x</italic><sub><italic>N</italic></sub>, <italic>y</italic><sub><italic>N</italic></sub>), of <italic>D</italic><sub><italic>j</italic></sub> at each node, <italic>m</italic> predictors were selected at random from the p predictors (m &#x0003C; p), and the node was split into two descendant nodes using the best split among the remaining predictors. This process was repeated until no further splits ere possible to grow a tree, and the steps were repeated again until all the trees were grown.</p>
<p>Although Random Forests can obtain good results using the default parameters in most cases, appropriate parameters can further improve the accuracy for particular situations. There is only one parameter to which random forests is somewhat sensitive&#x02014;<italic>m</italic>. This denotes the number of randomly selected predictor variables at each node. The default value of <italic>m</italic> is often set by <italic>p</italic>/3. In the RFR model, combined with symmetry functions, different values of <italic>m</italic> were tested, and, finally, <italic>m</italic> &#x0003D; 5 was determined by comparing the Pearson correlation coefficient (r) between the predicted charges and the ESP charges of <italic>Fe</italic><sup>2&#x0002B;</sup>. Another parameter, <italic>B</italic>, which represents the number of trees in the forest, can be chosen to be as large as desired; Breiman (<xref ref-type="bibr" rid="B10">2001</xref>) showed the generalization error for random forests converges almost surely to a limit as B increases. Here, B was set as 200.</p>
<p>When the training is completed, the prediction charge of a given atom i in a new geometry structure will be given by the average prediction of all individual trees. Thus, the predicted charge is assigned as Equation (1):</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The standard deviation of the predicted charge for atom i by the tree T is defined as Equation 2:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>q</italic><sub><italic>i</italic></sub>(<italic>T</italic><sub><italic>j</italic></sub>) is the partial charge predicted by tree <italic>T</italic><sub><italic>j</italic></sub>. The RFR algorithm was implemented using the scikit-learn module in Python.</p>
</sec>
<sec>
<title>2.3. Descriptor Selection</title>
<p>To encode the physical features and the mandatory symmetries of the problem, many descriptors have been introduced (Imbalzano et al., <xref ref-type="bibr" rid="B32">2018</xref>). For example, Huan et al. (<xref ref-type="bibr" rid="B31">2017</xref>) utilized a d-dimensional vector <italic>V</italic><sub><italic>i</italic>, &#x003B1;</sub>, representing the atomic environment of atom <italic>i</italic> viewed along the Cartesian &#x003B1; direction (Huan et al., <xref ref-type="bibr" rid="B31">2017</xref>). Heid et al. (<xref ref-type="bibr" rid="B29">2019</xref>) used the type of each atom and its connectivity as the input for the neural network. Schutt et al. (<xref ref-type="bibr" rid="B49">2017</xref>) introduced a vector of nuclear charges and a matrix of atomic distances to describe the molecular structures. In addition, molecules can be represented as Coulomb matrices (Rupp et al., <xref ref-type="bibr" rid="B45">2012</xref>; Lilienfeld, <xref ref-type="bibr" rid="B37">2015</xref>), scattering transforms (Hansen et al., <xref ref-type="bibr" rid="B27">2015</xref>), bags of bonds (Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B3">2010</xref>; Bart&#x000F3;k et al., <xref ref-type="bibr" rid="B2">2013</xref>), and so on. Among these various descriptors, atomic-based symmetric function, which was first proposed by Behler et al. (<xref ref-type="bibr" rid="B6">2007</xref>), has been widely used in machine learning (Behler et al., <xref ref-type="bibr" rid="B6">2007</xref>; Behler, <xref ref-type="bibr" rid="B4">2011a</xref>,<xref ref-type="bibr" rid="B5">b</xref>). Here, we adopted this method to describe the molecular structure.</p>
<p>Atom-based symmetric functions describe the chemical environment of atom <italic>i</italic> in terms of radial and angular terms. Therefore, each atom&#x00027;s Cartesian coordinate <italic>R</italic><sub><italic>i</italic></sub> &#x0003D; (<italic>x</italic><sub><italic>i</italic></sub>, <italic>y</italic><sub><italic>i</italic></sub>, <italic>z</italic><sub><italic>i</italic></sub>) needs to be converted into the so-called symmetric function form of Equation (3):</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x0007B;</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0007D;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mrow><mml:mrow><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the total contribution of the distance between all the surrounding atoms, and atom <italic>i</italic>, and <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the angular relationship between any two surrounding atoms and itself. All atoms are distinguished according to their element <italic>E</italic><sub><italic>i</italic></sub>, and the set of symmetric functions of two atoms belonging to the same element are thus the same.</p>
<p>In this study, Equation (4) was used to describe the distance component of each atom, where <italic>R</italic><sub><italic>ij</italic></sub> represents the distance between atom <italic>i</italic> and <italic>j</italic>. The cutoff function <italic>f</italic><sub><italic>c</italic></sub>(<italic>R</italic><sub><italic>ij</italic></sub>) was introduced in Equation 5 because the atoms in the molecular dynamic simulation may enter or leave the cutoff distance, which can lead to the number of neighbor atoms to be variable. Here, <italic>R</italic><sub><italic>c</italic></sub> was thus set to 99 &#x000C5; to include all the atoms, and <italic>R</italic><sub><italic>s</italic></sub> and &#x003B7; were both set to 1.0.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>J</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo>-</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M9"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>0.5</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02265;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>Equation (6) is the angular component, which defines the angular distribution centered on each reference atom; here, &#x003BB; &#x0003D; 1.0, &#x003B6; &#x0003D; 1.0.</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msup><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>J</mml:mi><mml:mi>&#x00026;</mml:mi><mml:mi>k</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003BB;</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>a</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003B6;</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x000D7;</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Therefore, through coordinate transformation, the symmetric functions for each atom can be obtained and combined with the ESP charge to finally form the training set as the input of model.</p>
<p>Meanwhile, in order to compare the effect of descriptor selection on prediction performance, 11 structural parameters were manually selected and used as descriptors to train the model. Specifically, the 11 parameters included eight distance values (Fe-N1, Fe-N2, Fe-N3, Fe-N4, Fe-N11, Fe-O12, Fe-O13, and O12-O13), one angle value (Fe-O12-O13), and two dihedral angles (N2-Fe-N1-C10 and N1-Fe-N2-C5). According to our chemical perception, these 11 parameters reflect the features of molecule structure, so they can well-describe different conformations.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Results and Discussion</title>
<sec>
<title>3.1. Charge Distribution of Multi-Spin State in the Initial Data Set</title>
<p>It can be seen in <xref ref-type="fig" rid="F3">Figure 3</xref> that most variations range from 0.5 to 0.7e; the fluctuation of <italic>Fe</italic><sup>2&#x0002B;</sup> was the most significant, as it was close to 2e. The variation of O12 was larger than that of O13. It can also be found that there was a slight tendency for the mean value of N to decrease and the mean value of C to increase. For Fe and the coordinating O12 and O13, the difference among the mean values under different spin states was relatively more significant. Specifically, the atomic charge of <italic>Fe</italic><sup>2&#x0002B;</sup> in the singlet state was distributed around 1.2 and 1.5e in quintuplet. Further analysis of the charge distribution of different spin states showed that the triplet charge of <italic>Fe</italic><sup>2&#x0002B;</sup> in most structures was greater than the singlet charge (&#x00394;31 &#x0003E; 0, see <xref ref-type="fig" rid="F4">Figure 4</xref>), with the difference being at the highest probability concentrated at 0.1e, while, for the quintuplet and triplet spin state, the difference reached 0.2e. The results confirmed that different spin states in the same structure had distinct charge distributions.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Charge distribution of multi-spin state for 14 atoms of the model in the data set. <bold>(A)</bold> Is boxplot representation of charge distribution of 14 atoms in the heme system in different spin states. <bold>(B)</bold> Is probability density of charge distribution of <italic>Fe</italic><sup>2&#x0002B;</sup>. <bold>(C)</bold> Is the probability density of the charge distribution of the O12 atom.</p></caption>
<graphic xlink:href="fchem-08-00162-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Histogram of charge differences between two spin states of selected atoms. <bold>(A)</bold> Is O12 atom and <bold>(B)</bold> is <italic>Fe</italic><sup>2&#x0002B;</sup>. The black color code is the charge difference of singlet and triplet and gray color code is charge difference of quintuplet and triplet state.</p></caption>
<graphic xlink:href="fchem-08-00162-g0004.tif"/>
</fig>
<p>Additionally, it should be noted that the atomic charge of each atom fluctuated within a certain range, among which <italic>Fe</italic><sup>2&#x0002B;</sup> fluctuated the most. Just taking the singlet state as an example, the variation ranged from 0.3 to 2.2e, which implied that the charge distribution of a certain atom in a specific spin state was conformation dependent.</p>
</sec>
<sec>
<title>3.2. Charge Prediction of RFR Model With Symmetric Functions</title>
<p>In order to better distinguish between different molecular structures, the atom-based symmetry functions were used to convert atomic coordinates into a series of function values, which embed the atoms in their neighborhood depending on the element type (Schutt et al., <xref ref-type="bibr" rid="B49">2017</xref>). It is an efficient way to consider the chemical environments that the invariances, such as translation, rotation, and permutation, can be guaranteed to be exploited by. By doing so, the RFR model combined with symmetry functions and ESP charge was constructed.</p>
<p>As mentioned above, although complex parameter tuning is not required in the RFR model, it is sensitive to the number of descriptors. To this end, we tested and compared the predicted charge of <italic>Fe</italic><sup>2&#x0002B;</sup> at different values of <italic>m</italic> (i.e., the number of features selected from p descriptors at random; here <italic>p</italic> &#x0003D; 19) and then calculated the correlation between the predictions and the ESP charges. The results are shown in <xref ref-type="table" rid="T1">Table 1</xref>. It can be seen from <xref ref-type="table" rid="T1">Table 1</xref> that, when <italic>m</italic> &#x0003D; 5, the correlation between the predicted value and the fitted value is the largest (0.9784), which indicated that prediction gave the best performance. The parameter <italic>m</italic> was consequently set to five in the subsequent analysis.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Tests on the number of features selected in the RFR model.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold><italic>m</italic></bold></th>
<th valign="top" align="center"><bold>Pearson correlation coefficient</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">0.9784</td>
</tr>
<tr>
<td valign="top" align="left">0.2</td>
<td valign="top" align="center">0.9750</td>
</tr>
<tr>
<td valign="top" align="left">log2</td>
<td valign="top" align="center">0.9771</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M12"><mml:msqrt><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula></td>
<td valign="top" align="center">0.9771</td>
</tr>
<tr>
<td valign="top" align="left">19</td>
<td valign="top" align="center">0.8729</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>When using the symmetric function method, each molecular structure is described by 19 features (p &#x0003D; 19), and m is the max features randomly selected from it to fit a tree</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>To assess the prediction performance of the charge models, the mean absolute error (MAE) was calculated for each atom in the three spin states, and the standard deviation of the error was given as well (<xref ref-type="table" rid="T2">Table 2</xref>). According to <xref ref-type="table" rid="T2">Table 2</xref>, the MAEs of the predicted charges in three spin states are all within 0.015e for all the spin states. There was no obvious difference between two states. For each state, most of the MAEs of the atoms were within 0.02e as well, except for <italic>Fe</italic><sup>2&#x0002B;</sup>, which reached a maximum of 0.047e. Moreover, the Pearson correlation coefficient (<italic>r</italic>) between the predicted charges and the ESP charges of the RFR model in all three states was above 0.96. These data demonstrated that the model had high prediction accuracy, especially for N1 and N2. At the same time, the MAE and error standard deviation were close in the three states, indicating that our RFR model had good stability.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The performance of prediction using RFR model with symmetric functions for three spin states.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Atoms</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Predicted values (e)</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>MAE</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Error std</bold>.</th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Pearson coefficient</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Singlet</bold></th>
<th valign="top" align="center"><bold>Triplet</bold></th>
<th valign="top" align="center"><bold>Quintuplet</bold></th>
<th valign="top" align="center"><bold>Singlet</bold></th>
<th valign="top" align="center"><bold>Triplet</bold></th>
<th valign="top" align="center"><bold>Quintuplet</bold></th>
<th valign="top" align="center"><bold>Singlet</bold></th>
<th valign="top" align="center"><bold>Triplet</bold></th>
<th valign="top" align="center"><bold>Quintuplet</bold></th>
<th valign="top" align="center"><bold>Singlet</bold></th>
<th valign="top" align="center"><bold>Triplet</bold></th>
<th valign="top" align="center"><bold>Quintuplet</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>Fe</italic><sup>2&#x0002B;</sup></td>
<td valign="top" align="center">1.390</td>
<td valign="top" align="center">1.382</td>
<td valign="top" align="center">1.459</td>
<td valign="top" align="center">0.048</td>
<td valign="top" align="center">0.046</td>
<td valign="top" align="center">0.047</td>
<td valign="top" align="center">0.051</td>
<td valign="top" align="center">0.049</td>
<td valign="top" align="center">0.050</td>
<td valign="top" align="center">0.978</td>
<td valign="top" align="center">0.980</td>
<td valign="top" align="center">0.982</td>
</tr>
<tr>
<td valign="top" align="left">N1</td>
<td valign="top" align="center">&#x02212;0.390</td>
<td valign="top" align="center">&#x02212;0.382</td>
<td valign="top" align="center">&#x02212;0.390</td>
<td valign="top" align="center">0.013</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.013</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.991</td>
</tr>
<tr>
<td valign="top" align="left">N2</td>
<td valign="top" align="center">&#x02212;0.387</td>
<td valign="top" align="center">&#x02212;0.378</td>
<td valign="top" align="center">&#x02212;0.385</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.991</td>
<td valign="top" align="center">0.990</td>
<td valign="top" align="center">0.991</td>
</tr>
<tr>
<td valign="top" align="left">N3</td>
<td valign="top" align="center">&#x02212;0.457</td>
<td valign="top" align="center">&#x02212;0.449</td>
<td valign="top" align="center">&#x02212;0.458</td>
<td valign="top" align="center">0.013</td>
<td valign="top" align="center">0.013</td>
<td valign="top" align="center">0.012</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.988</td>
<td valign="top" align="center">0.988</td>
<td valign="top" align="center">0.989</td>
</tr>
<tr>
<td valign="top" align="left">N4</td>
<td valign="top" align="center">&#x02212;0.451</td>
<td valign="top" align="center">&#x02212;0.443</td>
<td valign="top" align="center">&#x02212;0.452</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.014</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.986</td>
<td valign="top" align="center">0.986</td>
<td valign="top" align="center">0.986</td>
</tr>
<tr>
<td valign="top" align="left">C5</td>
<td valign="top" align="center">0.350</td>
<td valign="top" align="center">0.357</td>
<td valign="top" align="center">0.374</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.985</td>
<td valign="top" align="center">0.984</td>
<td valign="top" align="center">0.984</td>
</tr>
<tr>
<td valign="top" align="left">C6</td>
<td valign="top" align="center">&#x02212;0.369</td>
<td valign="top" align="center">&#x02212;0.371</td>
<td valign="top" align="center">&#x02212;0.377</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.973</td>
<td valign="top" align="center">0.972</td>
<td valign="top" align="center">0.972</td>
</tr>
<tr>
<td valign="top" align="left">C7</td>
<td valign="top" align="center">0.363</td>
<td valign="top" align="center">0.368</td>
<td valign="top" align="center">0.384</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.978</td>
<td valign="top" align="center">0.977</td>
<td valign="top" align="center">0.977</td>
</tr>
<tr>
<td valign="top" align="left">C8</td>
<td valign="top" align="center">0.371</td>
<td valign="top" align="center">0.375</td>
<td valign="top" align="center">0.390</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.978</td>
<td valign="top" align="center">0.977</td>
<td valign="top" align="center">0.978</td>
</tr>
<tr>
<td valign="top" align="left">C9</td>
<td valign="top" align="center">&#x02212;0.372</td>
<td valign="top" align="center">&#x02212;0.373</td>
<td valign="top" align="center">&#x02212;0.378</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.018</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">0.020</td>
<td valign="top" align="center">0.971</td>
<td valign="top" align="center">0.970</td>
<td valign="top" align="center">0.967</td>
</tr>
<tr>
<td valign="top" align="left">C10</td>
<td valign="top" align="center">0.348</td>
<td valign="top" align="center">0.354</td>
<td valign="top" align="center">0.370</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.983</td>
<td valign="top" align="center">0.983</td>
<td valign="top" align="center">0.984</td>
</tr>
<tr>
<td valign="top" align="left">N11</td>
<td valign="top" align="center">0.052</td>
<td valign="top" align="center">0.051</td>
<td valign="top" align="center">0.007</td>
<td valign="top" align="center">0.004</td>
<td valign="top" align="center">0.005</td>
<td valign="top" align="center">0.006</td>
<td valign="top" align="center">0.005</td>
<td valign="top" align="center">0.005</td>
<td valign="top" align="center">0.008</td>
<td valign="top" align="center">0.988</td>
<td valign="top" align="center">0.985</td>
<td valign="top" align="center">0.971</td>
</tr>
<tr>
<td valign="top" align="left">O12</td>
<td valign="top" align="center">&#x02212;0.217</td>
<td valign="top" align="center">&#x02212;0.266</td>
<td valign="top" align="center">&#x02212;0.306</td>
<td valign="top" align="center">0.005</td>
<td valign="top" align="center">0.007</td>
<td valign="top" align="center">0.008</td>
<td valign="top" align="center">0.006</td>
<td valign="top" align="center">0.009</td>
<td valign="top" align="center">0.012</td>
<td valign="top" align="center">0.990</td>
<td valign="top" align="center">0.987</td>
<td valign="top" align="center">0.987</td>
</tr>
<tr>
<td valign="top" align="left">O13</td>
<td valign="top" align="center">&#x02212;0.229</td>
<td valign="top" align="center">&#x02212;0.224</td>
<td valign="top" align="center">&#x02212;0.238</td>
<td valign="top" align="center">0.002</td>
<td valign="top" align="center">0.003</td>
<td valign="top" align="center">0.004</td>
<td valign="top" align="center">0.004</td>
<td valign="top" align="center">0.006</td>
<td valign="top" align="center">0.006</td>
<td valign="top" align="center">0.987</td>
<td valign="top" align="center">0.970</td>
<td valign="top" align="center">0.978</td>
</tr>
<tr>
<td valign="top" align="left">Mean</td>
<td/>
<td/>
<td/>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.015</td>
<td valign="top" align="center">0.016</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.017</td>
<td valign="top" align="center">0.983</td>
<td valign="top" align="center">0.981</td>
<td valign="top" align="center">0.981</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For clarity, we further selected three atoms&#x02014;<italic>Fe</italic><sup>2&#x0002B;</sup>, N11, and O12&#x02014;to plot their charge distributions for comparison (<xref ref-type="fig" rid="F5">Figure 5</xref>). As shown in <xref ref-type="fig" rid="F5">Figure 5</xref>, the predicted charges of the RFR model are basically gathered around the straight line <italic>y</italic> &#x0003D; <italic>x</italic>; they were very close to the high-precision charges calculated by DFT, indicating that our model achieved satisfactory accuracy.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Distribution of predicted charge of atoms <italic>Fe</italic><sup>2&#x0002B;</sup>, N11, and O12 in three spin states. The illustration use <bold>(A&#x02013;C)</bold> for <italic>Fe</italic><sup>2&#x0002B;</sup>; <bold>(D&#x02013;F)</bold> for N11; <bold>(G&#x02013;I)</bold>. The color codes are Black for singlet state, Blue for triplet state, and Red for quintuplet state.</p></caption>
<graphic xlink:href="fchem-08-00162-g0005.tif"/>
</fig>
<p>By carefully comparing the distribution of each atom in different spin states, it can be found that the predicted values of <italic>Fe</italic><sup>2&#x0002B;</sup> have a good aggregation and few scattered points. However, the predictions are larger when the corresponding ESP charges are &#x0003C;1.5e and smaller when they are above 1.5e. The aggregation centers in the three states were different but essentially distributed in 0.5&#x02013;2.25e, which is consistent with the analysis in <xref ref-type="fig" rid="F5">Figure 5</xref>. For N11 atom, the predictions were more concentrated in the singlet and triplet, as there existed a few scattered points in the quintuplet. For O12 atoms, the correlation coefficients in all three spin states exceeded 0.98, and although there were some scattered points, the distribution was uniform. Finally, by comparing the distribution of different atoms in the same spin state, it can be found that the predictions in the triplet state were more concentrated overall. In summary, it was demonstrated that our model can predict the atomic charge of most structures well.</p>
</sec>
<sec>
<title>3.3. Charge Prediction of RFR Model With Manually Selected Structural Parameters</title>
<p>To compare the performance of the RFR models with different descriptors, we manually screened 11 parameters to describe the molecular structure, including eight bond lengths (Fe-N1,Fe-N2, Fe-N3, Fe-N4, Fe-N11, Fe-O12, Fe-O13, and O12-O13), one bond angle (Fe-O12-O13), and two dihedrals angles (N2-Fe-N1-C10 and N1-Fe-N2-C5). The same process and method were used for RFR model training and prediction. A comparison of the prediction performance of selected atoms (<italic>Fe</italic><sup>2&#x0002B;</sup>, N11, and O12) is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, and, for comparison, the root mean square error RMSE of the prediction of each atom in different spin states was further calculated and is shown in <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Comparison on the performance of two descriptors of RFR predictions in three spins states. The illustration use <bold>(A&#x02013;C)</bold> for <italic>Fe</italic><sup>2&#x0002B;</sup>; <bold>(D&#x02013;F)</bold> for N11; <bold>(G&#x02013;I)</bold>. Cyan corresponds to the RFR with a symmetric function, and magenta represents the RFR with 11 structural parameters.</p></caption>
<graphic xlink:href="fchem-08-00162-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Comparison of the root mean square error(RMSE) using different RFR models. <bold>(A&#x02013;C)</bold> Represent the RMSE of each atom in singlet, triplet, and quintuplet state, respectively. The black bar is the RFR model using symmetric functions, and the gray bar is the RFR model with 11 manual selected structural parameters.</p></caption>
<graphic xlink:href="fchem-08-00162-g0007.tif"/>
</fig>
<p>It can be seen clearly from <xref ref-type="fig" rid="F6">Figure 6</xref> that both models have good prediction performances, and the same model has a similar RMSE of predictions for different spin states. When 11 structural parameters were used as descriptors, however, the prediction values were more concentrated, and the model prediction performance was better than with in the case of symmetry functions. The average RMSE and the RMSE of each atom were reduced. Among these, the RMSE of <italic>Fe</italic><sup>2&#x0002B;</sup> reduced from 0.07 to 0.0035e, which is a maximum 0.06e improvement. We think that this is partially due to the use of a dihedral angle as the descriptor, which is a four-body term and is not included in the symmetry function.</p>
<p>In conclusion, choosing different descriptors will affect the prediction performance of the RFR model; the 11 manually selected parameters can better describe the molecular structure and thus achieved better results. At the same time, however, it should be noted that the difference between the two cases is not significant. As shown in <xref ref-type="fig" rid="F7">Figure 7</xref>, the RMSE of <italic>Fe</italic><sup>2&#x0002B;</sup> is relatively larger, but its fluctuations are still below 0.04e, and the variations of RMSE for other atoms are all below 0.02e. This indicates that, even if there is no empirical experience involved, the RFR model with symmetry functions can achieve satisfactory predictions, and the advantage is that it can be automatized.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>4. Conclusions</title>
<p>This study aimed at exploring the spin crossover phenomenon in the model heme system according to the characteristics of atomic charge distribution in different spin states with conformation. The random forest method was introduced to construct a prediction model of multi-spin variable charge, which can provide a separate prediction for a single atom.</p>
<p>In this model, symmetry functions were used as descriptors to describe the atomic chemical environment. The model was trained in conjunction with the ESP charges to predict the atomic charge in different spin states. Meanwhile, in order to compare the prediction performance, 11 artificially selected structural parameters were also used as the input of RFR model. The results showed that, when the 11 selected parameters were adopted, the prediction was more accurate, but it was not suitable for automation considering the involvement of human experience. In contrast, the RFR model using symmetry functions can achieve a good trade-off between calculation accuracy and efficiency, realize automatic processing, and provide separate prediction for a single atom. It should be noted that, in this method, the transformation of coordinates is a time-consuming pre-processing process, but it avoids the problem of inconsistent calculation of energy or force in Cartesian coordinates. When the number of descriptors is large enough, the random forest algorithm is very effective. This study is only a preliminary exploration of the heme force field, and there are still many deficiencies. In future work, we will further improve the calculation method of the multi-spin state variable charge force field parameters.</p>
</sec>
<sec sec-type="data-availability-statement" id="s5">
<title>Data Availability Statement</title>
<p>The datasets generated for this study are available on request to the corresponding author.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>JG: conceptualization. QL, WZ, X-HH, and L-HB: methodology. WZ and L-HB: validation. WZ, L-HB, and JG: writing&#x02013;original draft, writing&#x02013;review, and editing. JG: project administration and funding acquisition.</p>
<sec>
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amit</surname> <given-names>Y.</given-names></name> <name><surname>Geman</surname> <given-names>D.</given-names></name></person-group> (<year>1997</year>). <article-title>Shape quantization and recognition with randomized trees</article-title>. <source>Neural Comput</source>. <volume>9</volume>, <fpage>1545</fpage>&#x02013;<lpage>1588</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.7.1545</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Kondor</surname> <given-names>R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Publisher&#x00027;s note: on representing chemical environments [phys. Rev. B 87, 184115 (2013)]</article-title>. <source>Phys. Rev. B</source> <volume>87</volume>:<fpage>219902</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevB.87.219902</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bart&#x000F3;k</surname> <given-names>A. P.</given-names></name> <name><surname>Payne</surname> <given-names>M. C.</given-names></name> <name><surname>Kondor</surname> <given-names>R.</given-names></name> <name><surname>Cs&#x000E1;nyi</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <article-title>Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons</article-title>. <source>Phys. Rev. Lett</source>. <volume>104</volume>:<fpage>136403</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevLett.104.136403</pub-id><pub-id pub-id-type="pmid">20481899</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behler</surname> <given-names>J.</given-names></name></person-group> (<year>2011a</year>). <article-title>Atom-centered symmetry functions for constructing high-dimensional neural network potentials</article-title>. <source>J. Chem. Phys</source>. <volume>134</volume>:<fpage>074106</fpage>. <pub-id pub-id-type="doi">10.1063/1.3553717</pub-id><pub-id pub-id-type="pmid">21341827</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behler</surname> <given-names>J.</given-names></name></person-group> (<year>2011b</year>). <article-title>Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations</article-title>. <source>Phys. Chem. Chem. Phys</source>. <volume>13</volume>, <fpage>17930</fpage>&#x02013;<lpage>17955</lpage>. <pub-id pub-id-type="doi">10.1039/c1cp21668f</pub-id><pub-id pub-id-type="pmid">21915403</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behler</surname> <given-names>J.</given-names></name> <name><surname>Lorenz</surname> <given-names>S.</given-names></name> <name><surname>Reuter</surname> <given-names>K.</given-names></name></person-group> (<year>2007</year>). <article-title>Representing molecule-surface interactions with symmetry-adapted neural networks</article-title>. <source>J. Chem. Phys</source>. <volume>127</volume>:<fpage>014705</fpage>. <pub-id pub-id-type="doi">10.1063/1.2746232</pub-id><pub-id pub-id-type="pmid">17627362</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bleiziffer</surname> <given-names>P.</given-names></name> <name><surname>Schaller</surname> <given-names>K.</given-names></name> <name><surname>Riniker</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Machine learning of partial charges derived from high-quality quantum-mechanical calculations</article-title>. <source>J. Chem. Inf. Model</source>. <volume>58</volume>, <fpage>579</fpage>&#x02013;<lpage>590</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.7b00663</pub-id><pub-id pub-id-type="pmid">29461814</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Botu</surname> <given-names>V.</given-names></name> <name><surname>Batra</surname> <given-names>R.</given-names></name> <name><surname>Chapman</surname> <given-names>J.</given-names></name> <name><surname>Ramprasad</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Machine learning force fields: construction, validation, and outlook</article-title>. <source>J. Phys. Chem. C</source> <volume>121</volume>, <fpage>511</fpage>&#x02013;<lpage>522</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcc.6b10908</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bousseksou</surname> <given-names>A.</given-names></name> <name><surname>Molnar</surname> <given-names>G.</given-names></name> <name><surname>Salmon</surname> <given-names>L.</given-names></name> <name><surname>Nicolazzi</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>Molecular spin crossover phenomenon: recent achievements and prospects</article-title>. <source>Chem. Soc. Rev</source>. <volume>40</volume>, <fpage>3313</fpage>&#x02013;<lpage>3335</lpage>. <pub-id pub-id-type="doi">10.1039/c1cs15042a</pub-id><pub-id pub-id-type="pmid">21544283</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn</source>. <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bristow</surname> <given-names>J. K.</given-names></name> <name><surname>Tiana</surname> <given-names>D.</given-names></name> <name><surname>Walsh</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Transferable force field for metal-organic frameworks from first-principles: BTW-FF</article-title>. <source>J. Chem. Theory Comput</source>. <volume>10</volume>, <fpage>4644</fpage>&#x02013;<lpage>4652</lpage>. <pub-id pub-id-type="doi">10.1021/ct500515h</pub-id><pub-id pub-id-type="pmid">25574157</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cambi</surname> <given-names>L.</given-names></name> <name><surname>Szeg&#x000F6;</surname> <given-names>L.</given-names></name></person-group> (<year>1931</year>). <article-title>&#x000DC;ber die magnetische susceptibilit&#x000E4;t der komplexen verbindungen</article-title>. <source>Ber. Deutsch. Chem. Gesellsch</source>. <volume>64</volume>, <fpage>2591</fpage>&#x02013;<lpage>2598</lpage>. <pub-id pub-id-type="doi">10.1002/cber.19310641002</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>W.-K.</given-names></name> <name><surname>Liu</surname> <given-names>X.-Y.</given-names></name> <name><surname>Fang</surname> <given-names>W.-H.</given-names></name> <name><surname>Dral</surname> <given-names>P. O.</given-names></name> <name><surname>Cui</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep learning for nonadiabatic excited-state dynamics</article-title>. <source>J. Phys. Chem. Lett</source>. <volume>9</volume>, <fpage>6702</fpage>&#x02013;<lpage>6708</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpclett.8b03026</pub-id><pub-id pub-id-type="pmid">30403870</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cong</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Jin</surname> <given-names>K.</given-names></name> <name><surname>Zhong</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>J. Z. H.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Exploring the reasons for decrease in binding affinity of hiv-2 against hiv-1 protease complex using interaction entropy under polarized force field</article-title>. <source>Front. Chem</source>. <volume>6</volume>:<fpage>380</fpage>. <pub-id pub-id-type="doi">10.3389/fchem.2018.00380</pub-id><pub-id pub-id-type="pmid">30197882</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cutler</surname> <given-names>A.</given-names></name> <name><surname>Cutler</surname> <given-names>D.</given-names></name> <name><surname>Stevens</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Random forests</article-title>. <source>Mach. Learn</source>. <volume>45</volume>, <fpage>157</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4419-9326-7_5</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cutler</surname> <given-names>D. R.</given-names></name> <name><surname>Edwards</surname> <given-names>T. C.</given-names> <suffix>Jr.</suffix></name> <name><surname>Beard</surname> <given-names>K. H.</given-names></name> <name><surname>Culter</surname> <given-names>A.</given-names></name> <name><surname>Hess</surname> <given-names>K. T.</given-names></name> <name><surname>Gibson</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Random forests for classification in ecology</article-title>. <source>Ecology</source> <volume>88</volume>, <fpage>2783</fpage>&#x02013;<lpage>2792</lpage>. <pub-id pub-id-type="doi">10.1890/07-0539.1</pub-id><pub-id pub-id-type="pmid">18051647</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De</surname> <given-names>S.</given-names></name> <name><surname>Chamoreau</surname> <given-names>L. M.</given-names></name> <name><surname>El Said</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Y. L.</given-names></name> <name><surname>Flambard</surname> <given-names>A.</given-names></name> <name><surname>Boillot</surname> <given-names>M. L.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Thermally-induced spin crossover and liesst effect in the neutral [FeII(Me<sub>bik</sub>)<sub>2</sub>(NCX)<sub>2</sub>] complexes: variable-temperature structural, magnetic, and optical studies (X = S, Se; Me<sub>bik</sub> = bis(1-methylimidazol-2-yl)ketone)</article-title>. <source>Front. Chem</source>. <volume>6</volume>:<fpage>15</fpage>. <pub-id pub-id-type="doi">10.3389/fchem.2018.00326</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doukov</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Sharma</surname> <given-names>A.</given-names></name> <name><surname>Martell</surname> <given-names>J. D.</given-names></name> <name><surname>Soltis</surname> <given-names>S. M.</given-names></name> <name><surname>Silverman</surname> <given-names>R. B.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Temperature-dependent spin crossover in neuronal nitric oxide synthase bound with the heme-coordinating thioether inhibitors</article-title>. <source>J. Am. Chem. Soc</source>. <volume>133</volume>, <fpage>8326</fpage>&#x02013;<lpage>8334</lpage>. <pub-id pub-id-type="doi">10.1021/ja201466v</pub-id><pub-id pub-id-type="pmid">21534614</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Zhu</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Dioxygen activation by iron complexes: the catalytic role of intersystem crossing dynamics for a heme-related model</article-title>. <source>J. Phys. Chem. C</source> <volume>122</volume>, <fpage>2821</fpage>&#x02013;<lpage>2831</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcc.7b11462</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Engler</surname> <given-names>M. S.</given-names></name> <name><surname>Caron</surname> <given-names>B.</given-names></name> <name><surname>Veen</surname> <given-names>L.</given-names></name> <name><surname>Geerke</surname> <given-names>D. P.</given-names></name> <name><surname>Mark</surname> <given-names>A. E.</given-names></name> <name><surname>Klau</surname> <given-names>G. W.</given-names></name></person-group> (<year>2019</year>). <article-title>Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach</article-title>. <source>Algorithms Mol. Biol</source>. <volume>14</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/s13015-019-0138-7</pub-id><pub-id pub-id-type="pmid">30839948</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Frisch</surname> <given-names>M. J.</given-names></name> <name><surname>Trucks</surname> <given-names>G. W.</given-names></name> <name><surname>Schlegel</surname> <given-names>H. B.</given-names></name> <name><surname>Scuseria</surname> <given-names>G. E.</given-names></name> <name><surname>Robb</surname> <given-names>M. A.</given-names></name> <name><surname>Cheeseman</surname> <given-names>J. R.</given-names></name> <etal/></person-group>. (<year>2016</year>). <source>Gaussian 16 Revision A.03</source>. <publisher-loc>Wallingford, CT</publisher-loc>: <publisher-name>Gaussian Inc</publisher-name>.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Genuer</surname> <given-names>R.</given-names></name> <name><surname>Poggi</surname> <given-names>J.-M.</given-names></name> <name><surname>Tuleau-Malot</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>Variable selection using random forests</article-title>. <source>Pattern Recogn. Lett</source>. <volume>31</volume>, <fpage>2225</fpage>&#x02013;<lpage>2236</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2010.03.014</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gutlich</surname> <given-names>P.</given-names></name> <name><surname>Gaspar</surname> <given-names>A. B.</given-names></name> <name><surname>Garcia</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Spin state switching in iron coordination compounds</article-title>. <source>Beilstein J. Org. Chem</source>. <volume>9</volume>, <fpage>342</fpage>&#x02013;<lpage>391</lpage>. <pub-id pub-id-type="doi">10.3762/bjoc.9.39</pub-id><pub-id pub-id-type="pmid">23504535</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>G&#x000FC;tlich</surname> <given-names>P.</given-names></name> <name><surname>Goodwin</surname> <given-names>H. A.</given-names></name></person-group> (<year>2004</year>). <source>Spin Crossover in Transition Metal Compounds III</source>. <publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>.</citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Habenicht</surname> <given-names>B. F.</given-names></name> <name><surname>Prezhdo</surname> <given-names>O. V.</given-names></name></person-group> (<year>2012</year>). <article-title><italic>Ab initio</italic> time-domain study of the triplet state in a semiconducting carbon nanotube: intersystem crossing, phosphorescence time, and line width</article-title>. <source>J. Am. Chem. Soc</source>. <volume>134</volume>, <fpage>15648</fpage>&#x02013;<lpage>15651</lpage>. <pub-id pub-id-type="doi">10.1021/ja305685v</pub-id><pub-id pub-id-type="pmid">22967091</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hagai</surname> <given-names>E.</given-names></name> <name><surname>Rustam</surname> <given-names>Z. K.</given-names></name> <name><surname>Thomas</surname> <given-names>D. K.</given-names></name> <name><surname>Jorg</surname> <given-names>B.</given-names></name> <name><surname>Michele</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title><italic>Ab initio</italic> quality neural-network potential for sodium</article-title>. <source>Phys. Rev. B</source> <volume>81</volume>:<fpage>184107</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevB.81.184107</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>K.</given-names></name> <name><surname>Biegler</surname> <given-names>F.</given-names></name> <name><surname>Ramakrishnan</surname> <given-names>R.</given-names></name> <name><surname>Pronobis</surname> <given-names>W.</given-names></name> <name><surname>von Lilienfeld</surname> <given-names>O. A.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space</article-title>. <source>J. Phys. Chem. Lett</source>. <volume>6</volume>, <fpage>2326</fpage>&#x02013;<lpage>2331</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpclett.5b00831</pub-id><pub-id pub-id-type="pmid">26113956</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hauser</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Spin-crossover materials. properties and applications</article-title>. <source>Angew. Chem. Int. Ed</source>. <volume>52</volume>:<fpage>10419</fpage>. <pub-id pub-id-type="doi">10.1002/anie.201306160</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heid</surname> <given-names>E.</given-names></name> <name><surname>Fleck</surname> <given-names>M.</given-names></name> <name><surname>Chatterjee</surname> <given-names>P.</given-names></name> <name><surname>Schroder</surname> <given-names>C.</given-names></name> <name><surname>MacKerell</surname> <given-names>A. D. J</given-names></name></person-group>. (<year>2019</year>). <article-title>Toward prediction of electrostatic parameters for force fields that explicitly treat electronic polarization</article-title>. <source>J. Chem. Theory Comput</source>. <volume>15</volume>, <fpage>2460</fpage>&#x02013;<lpage>2469</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jctc.8b01289</pub-id><pub-id pub-id-type="pmid">30811193</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>W.</given-names></name> <name><surname>Ye</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>G.</given-names></name> <name><surname>Luo</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Machine learning protocol for surface-enhanced raman spectroscopy</article-title>. <source>J. Phys. Chem. Lett</source>. <volume>10</volume>, <fpage>6026</fpage>&#x02013;<lpage>6031</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpclett.9b02517</pub-id><pub-id pub-id-type="pmid">31538788</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huan</surname> <given-names>T. D.</given-names></name> <name><surname>Batra</surname> <given-names>R.</given-names></name> <name><surname>Chapman</surname> <given-names>J.</given-names></name> <name><surname>Krishnan</surname> <given-names>S.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Ramprasad</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>A universal strategy for the creation of machine learning-based atomistic force fields</article-title>. <source>NPJ Comput. Mater</source>. <volume>3</volume>:<fpage>37</fpage>. <pub-id pub-id-type="doi">10.1038/s41524-017-0042-y</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Imbalzano</surname> <given-names>G.</given-names></name> <name><surname>Anelli</surname> <given-names>A.</given-names></name> <name><surname>Giofr&#x000E9;</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials</article-title>. <source>J. Chem. Phys</source>. <volume>148</volume>:<fpage>241730</fpage>. <pub-id pub-id-type="doi">10.1063/1.5024611</pub-id><pub-id pub-id-type="pmid">29960368</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Inokuchi</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>N.</given-names></name> <name><surname>Morohoshi</surname> <given-names>K.</given-names></name> <name><surname>Arai</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Multiscale prediction of functional self-assembled materials using machine learning: high-performance surfactant molecules</article-title>. <source>Nanoscale</source> <volume>10</volume>, <fpage>16013</fpage>&#x02013;<lpage>16021</lpage>. <pub-id pub-id-type="doi">10.1039/C8NR03332C</pub-id><pub-id pub-id-type="pmid">30105348</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ivanov</surname> <given-names>M. V.</given-names></name> <name><surname>Talipov</surname> <given-names>M. R.</given-names></name> <name><surname>Timerghazin</surname> <given-names>Q. K.</given-names></name></person-group> (<year>2015</year>). <article-title>Genetic algorithm optimization of point charges in force field development: challenges and insights</article-title>. <source>J. Phys. Chem. A</source> <volume>119</volume>, <fpage>1422</fpage>&#x02013;<lpage>1434</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpca.5b00218</pub-id><pub-id pub-id-type="pmid">25648549</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jureschi</surname> <given-names>C. M.</given-names></name> <name><surname>Rusu</surname> <given-names>I.</given-names></name> <name><surname>Codjovi</surname> <given-names>E.</given-names></name> <name><surname>Linares</surname> <given-names>J.</given-names></name> <name><surname>Garcia</surname> <given-names>Y.</given-names></name> <name><surname>Rotaru</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Thermo- and piezochromic properties of [fe(hyptrz)]a2&#x000B7;h2o spin crossover 1d coordination polymer: towards spin crossover based temperature and pressure sensors</article-title>. <source>Phys. B Phys. Condensed Matter</source> <volume>449</volume>, <fpage>47</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/j.physb.2014.04.081</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klusowski</surname> <given-names>J. M.</given-names></name></person-group> (<year>2018</year>). <article-title>Sharp analysis of a simple model for random forests</article-title>. <source>arXiv. [Preprint]</source>. arXiv:1805.02587.</citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lilienfeld</surname> <given-names>R. R. A. V.</given-names></name></person-group> (<year>2015</year>). <article-title>Machine learning, quantum mechanics, and chemical compound space</article-title>. <source>Phys. Chem. Chem. Phys</source>. <volume>15</volume>, <fpage>501</fpage>&#x02013;<lpage>509</lpage>. <pub-id pub-id-type="doi">10.1002/9781119356059.ch5</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Du</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Direct learning hidden excited state interaction patterns from <italic>ab initio</italic> dynamics and its implication as alternative molecular mechanism models</article-title>. <source>Sci. Rep</source>. <volume>7</volume>:<fpage>8737</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-017-09347-2</pub-id><pub-id pub-id-type="pmid">28821842</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meyer</surname> <given-names>R.</given-names></name> <name><surname>M&#x000FC;cksch</surname> <given-names>C.</given-names></name> <name><surname>Wolny</surname> <given-names>J. A.</given-names></name> <name><surname>Sch&#x000FC;nemann</surname> <given-names>V.</given-names></name> <name><surname>Urbassek</surname> <given-names>H. M.</given-names></name></person-group> (<year>2019</year>). <article-title>Atomistic simulations of spin-switch dynamics in multinuclear chain-like triazole spin-crossover molecules</article-title>. <source>Chem. Phys. Lett</source>. <volume>733</volume>:<fpage>136666</fpage>. <pub-id pub-id-type="doi">10.1016/j.cplett.2019.136666</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagl</surname> <given-names>J.</given-names></name> <name><surname>Gerald Aub&#x000F6;ck</surname> <given-names>G.</given-names></name> <name><surname>Hauser</surname> <given-names>A. W.</given-names></name> <name><surname>Allard</surname> <given-names>O.</given-names></name> <name><surname>Callegari</surname> <given-names>C.</given-names></name> <name><surname>Ernst</surname> <given-names>W. E.</given-names></name></person-group> (<year>2008</year>). <article-title>High-spin alkali trimers on helium nanodroplets: spectral separation and analysis</article-title>. <source>J. Chem. Phys</source>. <volume>128</volume>:<fpage>154320</fpage>. <pub-id pub-id-type="doi">10.1063/1.2906120</pub-id><pub-id pub-id-type="pmid">18433222</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rai</surname> <given-names>B. K.</given-names></name> <name><surname>Bakken</surname> <given-names>G. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Fast and accurate generation of <italic>ab initio</italic> quality atomic charges using nonparametric statistical regression</article-title>. <source>J. Comput. Chem</source>. <volume>34</volume>, <fpage>1661</fpage>&#x02013;<lpage>1671</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.23308</pub-id><pub-id pub-id-type="pmid">23653432</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reiher</surname> <given-names>M.</given-names></name> <name><surname>Oliver</surname> <given-names>S.</given-names></name> <name><surname>Bernd Artur</surname> <given-names>H.</given-names></name></person-group> (<year>2001</year>). <article-title>Reparameterization of hybrid functionals based on energy differences of states of different multiplicity</article-title>. <source>Theor. Chem. Acc</source>. <volume>107</volume>, <fpage>48</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1007/s00214-001-0300-3</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riniker</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Fixed-charge atomistic force fields for molecular dynamics simulations in the condensed phase: an overview</article-title>. <source>J. Chem. Inform. Model</source>. <volume>58</volume>, <fpage>565</fpage>&#x02013;<lpage>578</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.8b00042</pub-id><pub-id pub-id-type="pmid">29510041</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roman</surname> <given-names>Z.</given-names></name> <name><surname>Justin</surname> <given-names>S. S.</given-names></name> <name><surname>Leszczynski</surname> <given-names>J.</given-names></name> <name><surname>Isayev</surname> <given-names>O.</given-names></name></person-group> (<year>2019</year>). <article-title>Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network</article-title>. <source>Sci. Adv</source>. <volume>5</volume>:<fpage>eaav6490</fpage>. <pub-id pub-id-type="doi">10.1126/sciadv.aav6490</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rupp</surname> <given-names>M.</given-names></name> <name><surname>Tkatchenko</surname> <given-names>A.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>K.-R.</given-names></name> <name><surname>von Lilienfeld</surname> <given-names>O.</given-names></name></person-group> (<year>2012</year>). <article-title>Fast and accurate modeling of molecular atomization energies with machine learning</article-title>. <source>Phys. Rev. Lett</source>. <volume>108</volume>, <fpage>58301</fpage>&#x02013;<lpage>58300</lpage>. <pub-id pub-id-type="doi">10.1103/PhysRevLett.108.058301</pub-id><pub-id pub-id-type="pmid">22400967</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sahoo</surname> <given-names>S. K.</given-names></name> <name><surname>Nair</surname> <given-names>N. N.</given-names></name></person-group> (<year>2018</year>). <article-title>Interfacing the core-shell or the drude polarizable force field with car-parrinello molecular dynamics for qm/mm simulations</article-title>. <source>Front. Chem</source>. <volume>6</volume>:<fpage>275</fpage>. <pub-id pub-id-type="doi">10.3389/fchem.2018.00275</pub-id><pub-id pub-id-type="pmid">30042939</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salomon</surname> <given-names>O.</given-names></name> <name><surname>Reiher</surname> <given-names>M.</given-names></name> <name><surname>Hess</surname> <given-names>B. A.</given-names></name></person-group> (<year>2002</year>). <article-title>Assertion and validation of the performance of the b3lyp* functional for the first transition metal row and the g2 test set</article-title>. <source>J. Chem. Phys</source>. <volume>117</volume>:<fpage>4729</fpage>. <pub-id pub-id-type="doi">10.1063/1.1493179</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanvito</surname> <given-names>A. L. S.</given-names></name></person-group> (<year>2019</year>). <article-title>A unified picture of the covalent bond within quantum-accurate force fields: from organic molecules to metallic complexes&#x00027; reactivity</article-title>. <source>Sci. Adv</source>. <volume>5</volume>:<fpage>eaaw2210</fpage>. <pub-id pub-id-type="doi">10.1126/sciadv.aaw2210</pub-id><pub-id pub-id-type="pmid">31172029</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schutt</surname> <given-names>K. T.</given-names></name> <name><surname>Arbabzadah</surname> <given-names>F.</given-names></name> <name><surname>Chmiela</surname> <given-names>S.</given-names></name> <name><surname>Muller</surname> <given-names>K. R.</given-names></name> <name><surname>Tkatchenko</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <article-title>Quantum-chemical insights from deep tensor neural networks</article-title>. <source>Nat. Commun</source>. <volume>8</volume>:<fpage>13890</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms13890</pub-id><pub-id pub-id-type="pmid">28067221</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shao</surname> <given-names>X. D.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Yao</surname> <given-names>Y.-F.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>Switching dielectric constant near room temperature in a molecular crystal</article-title>. <source>Adv. Sci</source>. <volume>2</volume>:<fpage>1500029</fpage>. <pub-id pub-id-type="doi">10.1002/advs.201500029</pub-id><pub-id pub-id-type="pmid">27980939</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Statnikov</surname> <given-names>A.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Aliferis</surname> <given-names>C. F.</given-names></name></person-group> (<year>2008</year>). <article-title>A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>:<fpage>319</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-9-319</pub-id><pub-id pub-id-type="pmid">18647401</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Svetnik</surname> <given-names>V.</given-names></name></person-group> (<year>2003</year>). <article-title>Random forest a classification and regression tool for compound classification and qsar modeling</article-title>. <source>J. Chem. Inf. Comput. Sci</source>. <volume>2003</volume>, <fpage>1947</fpage>&#x02013;<lpage>1958</lpage>. <pub-id pub-id-type="doi">10.1021/ci034160g</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Unke</surname> <given-names>O. T.</given-names></name> <name><surname>Meuwly</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Physnet: a neural network for predicting energies, forces, dipole moments, and partial charges</article-title>. <source>J. Chem. Theory Comput</source>. <volume>15</volume>, <fpage>3678</fpage>&#x02013;<lpage>3693</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jctc.9b00181</pub-id><pub-id pub-id-type="pmid">31042390</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Atomic partial charge predictions for furanoses by random forest regression with atom type symmetry function</article-title>. <source>RSC Adv</source>. <volume>10</volume>, <fpage>666</fpage>&#x02013;<lpage>673</lpage>. <pub-id pub-id-type="doi">10.1039/C9RA09337K</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Yin</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Electrostatic polarization energies of charge carriers in organic molecular crystals: a comparative study with explicit state-specific atomic polarizability based amoeba force field and implicit solvent method</article-title>. <source>J. Chem. Theory Comput</source>. <volume>14</volume>, <fpage>3728</fpage>&#x02013;<lpage>3739</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jctc.8b00132</pub-id><pub-id pub-id-type="pmid">29870663</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ye</surname> <given-names>S.</given-names></name> <name><surname>Hu</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Zhong</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>A neural network protocol for electronic excitations of n-methylacetamide</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>116</volume>, <fpage>11612</fpage>&#x02013;<lpage>11617</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1821044116</pub-id><pub-id pub-id-type="pmid">31147467</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>S.</given-names></name> <name><surname>Feng</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Pang</surname> <given-names>J.</given-names></name> <name><surname>Bosch</surname> <given-names>M.</given-names></name> <name><surname>Lollar</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Stable metal-organic frameworks: design, synthesis, and applications</article-title>. <source>Adv. Mater</source>. <volume>30</volume>:<fpage>e1704303</fpage>. <pub-id pub-id-type="doi">10.1002/adma.201704303</pub-id><pub-id pub-id-type="pmid">29430732</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was supported by the National Key R&#x00026;D Program of China (Grant No. 2017YFB0203405), the National Natural Science Foundation of China (No. 21873034), and the Fundamental Research Funds for the Central Universities (Project 2662018JC027).</p>
</fn>
</fn-group>
</back>
</article>