^{1}

^{†}

^{1}

^{†}

^{1}

^{1}

^{1}

^{2}

^{1}

^{1}

^{*}

^{1}

^{2}

Edited by: Jianfeng Pei, Peking University, China

Reviewed by: Youjun Xu, Peking University, China; Chittaranjan Tripathy, Walmart Labs, United States

*Correspondence: Daniil Polykovskiy,

This article was submitted to Translational Pharmacology, a section of the journal Frontiers in Pharmacology

†These authors have contributed equally to this work

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug molecules that could induce a desired change in gene expression. Our model—the Bidirectional Adversarial Autoencoder—explicitly separates cellular processes captured in gene expression changes into two feature sets: those

Following the recent advances in machine learning, deep generative models found many applications in biomedicine, including drug discovery, biomarker development, and drug repurposing (

In this paper, we studied how conditional models scale to a more complex biological property; specifically, we have studied how drug incubation influences gene expression profiles. Using the LINCS L1000 (

In many conditional generation tasks,

We propose a new model—the Bidirectional Adversarial Autoencoder—that learns a joint distribution _{x}_{y}_{x}_{y}_{x}_{y}

The paper is organized into sections:

Conditional generative models generate objects

CausalGAN (

Multimodal learning models (

Information decoupling ideas have been previously applied in other contexts:

InfoGAN (

Machine learning has numerous applications in biomedicine and drug discovery (

Alongside large-scale studies that measure cellular processes, deep learning applications explore transcriptomics (

In drug discovery, apart from predicting pharmacological properties and learning useful representations of small molecules (

In this section, we introduce Unidirectional and a Bidirectional Adversarial Autoencoders and discuss their applications to conditional modeling. While we have focused on an example of molecular generation for transcriptome changes, in general, our model is not limited to these data types and can be used for generation in other domains.

Our model for conditional generation is based on a Supervised Adversarial Autoencoder (Supervised AAE, SAAE) (_{x}_{x}_{x}_{x}

The Supervised Adversarial Autoencoder model (SAAE).

where _{d}(_{1} balances reconstruction and adversarial losses. We trained the model by alternately maximizing the loss in Equation 1 with respect to the parameters of _{x}_{x}

Besides passing gene expression changes _{y}_{y}_{x}

The Latent Supervised Adversarial Autoencoder model (Latent SAAE).

Hyperparameters λ_{1} and λ_{2} balance object and condition reconstruction losses as well as the adversarial loss.

Both SAAE and Latent SAAE models learn conditional distribution _{x}_{y}_{x}_{y}_{x}_{x}_{y}_{y}

The underlying graphical model of the data: molecules _{x}, _{y}) and shared (

To train a model, we used inference networks that predict values of _{x}_{y}_{x}_{x}_{y}_{y}_{x}_{y}

For the molecule, _{x}_{y}

_{x}_{y}

The Bidirectional Adversarial Autoencoders model. The discriminators ensure that three latent code components are independent and indistinguishable from the prior distribution.

Two deterministic decoders (generators) _{x}_{y}

The objective function consists of three parts, each capturing restrictions from the graphical model—the structure of the shared representation, reconstruction quality, and independence of shared and exclusive representations.

_{x}_{y}

_{x}_{y}_{y}

where

_{x}_{y}_{x}_{y}

Note that since the target distribution for adversarial training is factorized, we expected that the trained model would learn independence of _{x}_{y}

_{x}_{y}_{y}_{y}_{x}_{x}

where _{x}_{y}_{x}_{y}

Combining these objectives, the final optimization problem becomes a minimax problem that can be solved by alternating gradient descent with respect to encoder and decoder parameters, and gradient ascent with respect to the discriminator parameters:

The hyperparameters λ_{1}, λ_{2}, and λ_{3} balance different objectives. In general, we optimize lambdas based on the performance of BiAAE on the holdout set in terms of the target metrics, such as estimated negative conditional log-likelihood. In practice, we found that optimal values of lambdas yielded the gradients of loss components on a similar scale.

The Bidirectional AAE can generate molecules that cause given transcriptome changes and transcriptome changes caused by a given molecule. However, if we only need conditional generation of molecules _{x}_{x}_{x}_{x}_{x}_{y}

The Unidirectional Adversarial Autoencoder: a simplified version of a Bidirectional Adversarial Autoencoder for generating from

In this section, we have described the experimental setup and presented numerical results on the toy Noisy MNIST dataset and a LINCS L1000 dataset (

We start by validating our models on the Noisy MNIST (

The train-validation-test splits contain 50,000, 10,000, and 10,000 samples respectively. We set the batch size to 128 and the learning rate to 0.0003, and we used the Adam (_{1} = 0.5, _{2} = 0.9 for models with adversarial training and _{1} = 0.99 and _{1} = 0.999 for others with a single update of autoencoders per a single update of the discriminator. Encoder and decoder architectures were the same for all models, with 12-dimensional _{x}_{y}_{rec} to 10 and 0.1 for ℒ_{shared}. Other λ were set to 1. For Unidirectional AAE, we increased weight for ℒ_{info} to 100. For baseline models we used similar architectures. Please refer to the

Conditional generative model _{y}|_{y}

Quantitative results for a Noisy MNIST experiment. Conditional Generation section evaluates how often the model produced a correct digit. Latent Codes section estimates the Mutual Information between _{x} and

Model | Accuracy, % | MI(_{y} |
MI( |
---|---|---|---|

SAAE ( |
43.68 | — | 1.665 |

Latent SAAE | 34.76 | — | |

CVAE ( |
0.4583 | — | 0.3074 |

JMVAE ( |
5.38 | 0.9515 | — |

VIB ( |
43.6 | — | 1.121 |

VCCA ( |
23.35 | 1.239 | — |

BiAAE (our) | 1.432 | — | |

UniAAE (our) | 47.61 | — |

Qualitative results on a Noisy MNIST dataset. The figure shows generated images

In this section, we have validated Bidirectional AAE on a gene expression profiles dataset with 978 genes. We use a dataset of transcriptomes from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 project (

For each cell line, the training set contains experiments characterized by the control (ge_{b}∈^{978}) and perturbation-induced (ge_{a}∈^{978}) gene expression profiles. We represented molecular structures in the SMILES format (

We preprocessed the training dataset by removing molecules with a molecular weight less than 250 and more than 550 Da. We then removed molecules that did not contain any oxygen or nitrogen atoms or contained atoms besides C, N, S, O, F, Cl, Br, and H. Finally, we removed molecules that contained rings with more than eight atoms or tetracyclines. The resulting dataset contained 5,216 unique SMILES. Since the dataset is small, we pretrained an autoencoder on the MOSES (

For all baseline models on differential gene expressions, we used similar hyperparameters shown in

Hyperparameters for neural networks training on gene expression data. All neural networks are fully connected, and decoders have an architecture symmetric to the encoders.

Hyperparameter | Value |
---|---|

Molecular Encoder | GRU; hidden size 128; 2 layers |

Expression Encoder | IN(978)→256→OUT(128) |

Difference Encoder | IN(129)→128→OUT(10 + 10) |

Discriminator | IN→1024→512→OUT(1) |

Batch Normalization | After each linear layer in encoders |

Activation Function | LeakyReLU |

Learning Rate | 0.0003 |

We used a two-step encoder for _{a}−ge_{b}. We first embedded Δge with a fully-connected neural network, and then concatenated the obtained representation with a logarithm of concentration

The architecture of the condition encoder for changes in the transcriptome. The input to the expression encoder is the difference between the control and perturbed expressions. We passed the dose to the last layers of the encoder.

The proposed BiAAE model can generate molecules for given gene expression changes and vice versa. We started by experimenting with the molecular generation (_{y}|Δge,

Validation results of conditional generation

Model | NLL | MI(_{y}|Δge, |
MI( |
Internal Diversity | Validity |
---|---|---|---|---|---|

SAAE | 0.55 | — | 0.64 | ||

Latent SAAE | 0.55 | — | 0.00 | 0.62 | |

CVAE | 1.22 | — | 0.00 | 0.84 | 0.58 |

JMVAE | 1.42 | 0.00 | — | 0.61 | |

VIB | 1.46 | — | 0.00 | 0.17 | 0.29 |

VCCA | 1.36 | 0.00 | — | 0.53 | 0.71 |

BiAAE | 0.77 | — | 0.76 | ||

UniAAE | 0.00 | — | 0.61 |

where

The proposed BiAAE and UniAAE architectures show the ability to capture the dependencies in the training set and generalize to new objects from the validation set. The BiAAE model provides better mutual information while preserving valid diverse molecules.

In this experiment, we show that the proposed generative model (BiAAE) can produce biologically meaningful results. We used a manually curated database of bioactive molecules ChEMBL 24.1 (

The first experiment evaluates molecular generation given a transcriptome change of a small molecule inhibitor of a specific protein. The ChEMBL dataset has experimental data on molecules that inhibit a certain human protein. We chose template molecules that are present in both LINCS molecule perturbation dataset and ChEMBL dataset. We used molecules that had inhibition concentration less than 10

The condition for molecular generation is a transcriptome change and a dose of a template molecule. Specifically, the condition is a shared part _{y}

The examples of generated molecules conditioned on gene expression changes from a protein inhibitor; Real most similar inhibitors from ChEMBL are provided for comparison.

The second experiment evaluates molecular generation given a transcriptome change of a specific gene knockdown. The LINCS dataset contains gene knockdown transcriptomes that the model was not trained on. For each gene knockdown, we found a corresponding human protein in the ChEMBL dataset. We chose template molecules that had a proven IC50 less than 10

The condition is different compared to the previous experiment in a way that the gene knockdown expression profile is not induced by a small molecule but rather shows the desired behavior of the potential drug. In

The examples of generated molecules conditioned on gene expression changes from a gene knockdown; Real most similar inhibitors of a knocked down gene are provided for comparison.

We experimented with predicting gene expression changes after drug incubation (_{x}|^{2} metric, which measures the determination coefficient between the real and predicted (Δge,

Validation results of conditional generation

Model | MI(Δge,_{y}| |
MI(Δge, |
Top-1 precision | ^{2} score |
---|---|---|---|---|

SAAE | — | 0.00 | 0.58 | 0.26 |

Latent SAAE | — | 0.74 | 0.28 | |

CVAE | — | 0.01 | 0.29 | |

JMVAE | 0.00 | — | 0.0 | 0.03 |

VIB | — | 0.00 | — | — |

VCCA | 0.00 | — | 0.0 | 0.03 |

BiAAE | 0.20 | — | 0.74 | 0.32 |

UniAAE | — | 0.27 |

To compute ^{2} and top-1 precision, we only used drugs that were administered at _{10} scale). Note that VIB was not able to generate any gene expression changes near 10

The experiment demonstrates that proposed UniAAE, BiAAE, and LatentSAAE models generalize well the symmetric task and show good metrics on predicting gene expression changes.

The key advantage of the proposed model compared to the previous works is the joint adversarial learning of latent representations of paired objects. This representation improves conditional generation metrics and shows promising results in molecular generation for desired transcriptome changes.

Three discriminator neural networks ensure that the latent representations divided into shared and exclusive parts are more meaningful and useful for the conditional generation. Two additional discriminator losses help the model learn a more expressive shared part and make sure that all three parts are mutually independent.

However, adversarial training slightly complicates the training procedure for the BiAAE model. In comparison with other baseline models, the training loss contains more terms, each with a coefficient to tune. In general, we tune these coefficients using grid search, and we select the best coefficients according to the generative metrics on the validation set. In practice, we simplify the grid search and use the same coefficient for the adversarial terms _{1}=_{4}=_{5} since the corresponding losses have values on the same scale. We choose the search space for coefficients _{2},_{3} in a way that the second and third terms provide the gradient in the same scale as the other terms.

Another problem that arises when we use the adversarial approach is the instability of training. The instability is the consequence of the minimax nature of adversarial training (_{1}=0.5,_{2}=0.9.

In this work, we proposed a Bidirectional Adversarial Autoencoder model for the generation of molecular structures for given gene expression changes. Our AAE-based architecture extracts shared information between molecule and gene expression changes and separates it from the remaining exclusive information. We showed that our model outperforms baseline conditional generative models on the Noisy MNIST dataset and the generation of molecular structures for the desired transcriptome changes.

The code and datasets for this study are available at

RS and MK implemented the BiAAE and baseline models and conducted the experiments. RS, AK, and AA prepared the datasets. RS, MK, AK, and DP derived the BiAAE and UniAAE models. RS, AZ, AK, SN, and DP wrote the manuscript. AK and DP supervised the project.

RS, MK, AZ, AK, AA, and DP work for Insilico Medicine, a commercial artificial intelligence company. SN works for Neuromation OU, a company engaged in AI development through synthetic data and generative models.

The original idea for molecular generation for a specific transcriptional or proteomic profile, a technology used broadly at Insilico Medicine, was proposed in 2016 by Dr. Alex Zhavoronkov, who is the co-author of the patent covering this technology.

The Supplementary Material for this article can be found online at: