Week 7: Analyzing Autoencoder Models For Coronary Artery Disease
April 21, 2023
Welcome to Week 7 of my Senior Project Blog! This week, I’ll cover my analysis of the CAD models and my findings on the structure of Autoencoder (AE) models that were built in our lab for CAD.
Autoencoder Models for CAD
Our lab at Scripps Research has recently developed autoencoders (AE) models for 184 fragments or tiles with CAD loci across all the human chromosomes based on genetic imputation. For each tile or fragment, there are up to 10 models that are trained and found to perform well (model_450491, model_202535, model_412291, model_84, model_288921, model_358105, model_346665, model_367798, model_43048, model_234945). Each of these models has a specific AE architecture. An example is shown below:
In the example, fragment #1 which spans the genomic loci from 238208698-238317017 on Chromosome 2 has a model model_202535 which is represented by an encoder that has 3 linear layers (1 input layer followed by 2 hidden layers) each of which has 3566 inputs and 3566 outputs with sigmoid activation function to capture the non-linearities. The output of the encoder block is the compressed or latent representation which is then fed to a symmetric decoder block which takes this representation and expands it using a 3-layer decoder.
All the information about multiple models for each tile is located in subfolders which are named after each fragment. Inside each fragment’s folder tree, there are 3 files for each model as we observed with Chr-22 models: a params.py file (which holds model parameters), a position file with a .pos extension (a position file indicating all the CAD loci within that fragment), and a model file with a .pth extension which holds the corresponding PyTorch model with weights.
As we saw with the Chromosome-22 models from Week 4, each CAD model parameters file contains a number of informative lines specifying various parameters such as the number of layers, the activation functions used, and the number of training steps.
Similar to Week 4, I wrote a python function to load each tile model and determine its structure: it takes as input the fragment name and the 3 files (position, PyTorch model, and parameter files) using PyTorch’s built-in functions. It first reads the parameter files to identify how many CAD loci are present in each position file corresponding to a tile. Then it loads the state dictionary from the pyTorch file and loads the final trained neural network can be loaded into memory. Using the function, I analyzed the 184 tile fragments each with up to 10 models to determine their network architectures. Since some of these pyTorch files can be very large, loading over 1700 models for 184 tiles and extracting their structure was very resource intensive often taking several hours even on the GPU-enabled cluster in Scripps Research.
From the charts and tables shown above, we can see that most of the models (over 60%) had 8 layers and most models (over 60%) do not reduce the dimensions of inputs (i.e. size ratio was mostly 1.0) which means that they are not sparse models.
When I examined the activation functions for the layers used across all the CAD models, Tanh and leaky relu are the most common accounting for ~59% of all models while Relu and sigmoid account for remaining ~41% of models.
Now that I have learned how to load and analyze the CAD model structure, I will explore how to extract the latent representations and embeddings for each model using the input vcf data next week.
Thank you for reading, and see you next week!
Sources:
- Dias, R., Evans, D., Chen, S.-F., Chen, K.-Y., Loguercio, S., Chan, L., & Torkamani, A. (2022). Rapid, reference-free human genotype imputation with denoising autoencoders. ELife, 11. https://doi.org/10.7554/elife.75600.
- “PyTorch Tutorials.” Welcome to PyTorch Tutorials – PyTorch Tutorials 2.0.0+cu117 Documentation, 2023, https://pytorch.org/tutorials/.
- TorkamaniLab. “Imputation_autoencoder/Example.vcf TorkamaniLab/imputation_autoencoder .” GitHub, 25 May 2022, https://github.com/TorkamaniLab/Imputation_Autoencoder/blob/master/test/example.vcf.