

Genetic basis of cecum metabolome composition in heterogeneous stock rats identifies ABC and SLC transporters and enzymes, including LSS, CYP, and UGT, as associated with metabolome abundance
Objectives
The objectives of the present analysis were to 1) perform a cecum metabolome characterization in heterogeneous stock (HS) rats; 2) describe the source for identified metabolites; 3) recognize microbiome-metabolome relationships; 4) achieve metabolomics-based genome-wide association (mGWAS) for identified compounds and quantitative trait loci (QTL) documentation; 5) identify individual cecum metabolites whose variability is partially due to genomic; 6) recognize genomic hot spots able to explain the abundance of multiple metabolites, here described as “association transbands” for metabolome abundance; and 7) perform a conditional analysis for association transbands and metabolites with medium-high SNP-based heritability. Figure 1 shows the methodological layout of this analysis.
Figure 1. Methodological layout for the present analysis.
Cecum metabolome abundance and composition in heterogeneous stock rats
When comparing the twenty most extreme rats (lowest ten versus highest ten, Figure 2), the average terpenoid proportion was 24.75±4.23% and 79.03±0.92%. The comparison for alkaloids (16.80±4.55% to 2.64±0.31%), amino acids and peptides (24.57±5.67% to 7.01±0.50%), and fatty acids (20.49±3.85% to 2.26±0.28%), showed the more drastic reduction when terpenoids proportion increased.
Well-annotated individual compounds by Sirius v5.7 driving these differences for terpenoids based on abundance included: 3alpha,7alpha,12beta-Trihydroxy-5beta-cholanoic_acid_cp16 (increased by 108.24%), 3b,12a-Dihydroxy-5a-cholanoic_acid_cp10 (increased by 105.93%), 12-Ketodeoxycholic_acid_cp9 (increased by 49.84%), 3alpha,7alpha,12beta-Trihydroxy-5beta-cholanoic_acid_cp14 (increased by 47.65%), 3b,12a-Dihydroxy-5a-cholanoic_acid_cp12 (increased by 28.48%), 3alpha,7alpha,12beta-Trihydroxy-5beta-cholanoic_acid_cp22 (increased by 24.83%), 3alpha,7alpha,12beta-Trihydroxy-5beta-cholanoic_acid_cp21 (increased by 16.38%), 12-Ketodeoxycholic_acid_cp11 (increased by 16.15%), Dihydroxy bile acid_cp3 (increased by 13.08%) and 3b,12a-Dihydroxy-5a-cholanoic_acid_cp4 (increased by 7.17%). Metabolites such as Hydroxyprolyl-Isoleucine_cp6 (amino acids and peptides) and 4-Amino-3-hydroxybutanoylcarnitine_cp3 (fatty acids) were only detected in individuals with high terpenoid proportion.
Figure 2. Relative metabolome abundance for extremes based on divergent terpenoid proportion. Classification using pathway by sample is presented. Metabolite and pathway identification were performed using Sirius v5.7. Ceca was analyzed using untargeted metabolome assessment by liquid chromatography with tandem mass spectrometry (LC-MS/MS).
Cecum microbiome-metabolome correlation in heterogeneous stock rats
Figure 3 shows the metabolite-microbiome correlation plot for retained relationships across centers. Medium correlations were identified between 40 compounds and 19 bacterial classifiers. The highest correlation value was recognized for (22alpha)-Hydroxy-campest-4-en-3-one_cp4 and for (±)-3-Hydroxynonanoic_acid_cp1, with Cyanobacteria (phylum), 4C0d.2 (class) and YS2 (order). The lowest correlation was identified for (20S,24E)-20,26-Dihydroxy-24-dammaren-3-one_cp16 and (11R,16S)-misoprostol_cp3, with Erysipelotrichi (class), Erysipelotrichales (order) and Erysipelotrichaceae (family).
An analysis by the center (Supplementary Figure 5A for NY and Supplementary Figure 5B for MI) provided stronger correlation values between microbiome and metabolome data. The analysis for NY showed the highest rxy values and significance. Erysipelotrichi (class), Erysipelotrichales (order), Bifidobacteriales (order), Erysipelotrichaceae (family), Bifidobacteriaceae (family), Obaculum (genus), and Bifidobacterium (genus) were negatively correlated with Dehydrocarpaine_I_cp1, Ercalcitriol_cp3, 2-Hexyl-3-phenyl-2-propenal_cp4, and three compounds of (3beta,17alpha,23S,24S)-17,23-Epoxy-3,24,29-trihydroxy-27-norlanost-8-en-15-one. Cyanobacteria (phylum), 4C0d.2 (class) and YS2 (order) were positively correlated with Pantothenic_acid_cp1, Voglibose_cp1, 1-Deoxy-1-morpholino-D-fructose_cp1, Niazirin_cp2, Ecgonine_methyl_ester_cp1, Deoxyinosine_cp1, Adenosine_cp1, Thymidine_cp1, trans-Piceid_cp1, Pyroglutamic_acid_cp1, and two compounds of Galactosyl_4-hydroxyproline. These bacterial classifications were positively correlated with LysoPE(0:0/18:1(11Z))_cp6, MG(0:0/16:0/0:0)_cp3, LysoPE(0:0/15:0)_cp3, Spermine_dialdehyde_cp2, Methylisopelletierine_cp2, 3alpha,7alpha-Dihydroxycoprostanic_acid_cp8 and 2-Methylbutyroylcarnitine_cp1. For MI, Allobaculum (genus) showed the highest correlation value with several metabolites, including Dehydrocarpaine_I_cp1.
Figure 3. Correlation analysis between compound and bacterial classifiers showing medium correlation. Ceca was collected from 1,087 individuals and used for untargeted metabolome assessment by liquid chromatography with tandem mass spectrometry (LC-MS/MS). Microbiome classifications showing at least one significant correlation with a rho value >= 0.2 were retained. Metabolites with at least one significant correlation and a rho value >= 0.3 were kept for plotting.
Whole genome association analysis for cecum metabolome and SNP-based heritability calculation
A total of 4,510 compounds were assessed, and the mGWAS results are presented in Figure 4. After FDR correction, 613 compounds remained significant. SNP-based heritability values ranged from 6.15±2.74 (3b,12a-Dihydroxy-5a-cholanoic acid_cp12) to 53.76±3.99 (C13H23NO5S_cp2) (Figure 4A). From the significant estimates, 25 were identified as having medium SNP-based h2 (between 0.2 and 0.5) and only two as high (higher or equal to 0.5). Two compounds of C13H23NO5S were highly heritable and compounds from metabolites such as (20S,24E)-20,26-Dihydroxy-24-dammaren-3-one, 13'-Carboxy-alpha-tocopherol, 13'-Hydroxy-alpha-tocotrienol, 19,20-DiHDPA, 24-Hydroxycalcitriol, 6-Deoxohomodolichosterone, 7-Dehydrocholesterol, Bis norcholic acid, Panaxatriol and Tetracosatetraenoic acid (24:4n-6) exhibited medium SNP-based heritability.
Figure 4B represents the mGWAS results for 4,510 compounds. A total of 4.4 million associations above 5.8 (-log10) were identified; 69.03% of them involved a compound classified as terpenoid, and 20.68% associations included an unknown metabolite; 10.28% of the associations comprised compounds classified as alkaloids, amino acids and peptides, carbohydrates, fatty acids, or polyketides. Shikimates and phenylpropanoids accounted for only 0.0038% of all the associations. Chromosomes 1, 7, and 15 showed the highest number of QTLs, including 34, 29, and 31 genomic regions, respectively. QTLs 14_1 (17.18-22.78 Mb), 1_1 (234.37-239.58 Mb), 14_2 (21.65-22.09 Mb), 20_1 (10,64-14,86 Mb), 6_1 (131,24-132,69 Mb) and 7_1 (111,07-117,22 Mb) showed the highest number of unique associated traits including a total of 226, 80, 72, 62 and 60 compounds. No QTLs on chromosome 11 were identified. QTL 14_2 is fully contained and overlapping with locus 14_1; therefore, further information concerning 14_2 is referred to QTL 14_1.
Figure 4A. Estimation of SNP-based heritability for 4,510 compounds. The X and Y-axes represent compound serial numbering and SNP-based h2 estimates, respectively. Blue bars denote significant estimates after FDR correction (n=613 compounds; Supplementary Table 2), including average (golden dot) and estimation interval; 4B. Metabolomics-based whole genome-wide association (mGWAS) analysis for 4,510 compounds. The X-axis represents the genomic location of each polymorphism; the Y-axis shows compounds being assessed. Hierarchical clustering for compound abundance is described on the left side of the plot. The strength of the association (-log10) is denoted by dot size. Color coding for pathway clustering for each association is shown. The association threshold was equal to 5.8 (-log10). Ceca was collected from 1,087 individuals and used for untargeted metabolome assessment by liquid chromatography with tandem mass spectrometry (LC-MS/MS).
Genomic structure for the most important association transbands is presented, including details for 14_1 (Figure 5A) and 20_1 (Figure 5B). Based on the number of unique associated compounds, the most important QTL was 14_1 (14:17.18-22.78 Mb). This QTL was associated with a total of 226 compounds, including terpenoids such as [6]-Gingerdiol_3,5-diacetate, 13'-Carboxy-alpha-tocopherol, 13'-Hydroxy-alpha-tocotrienol, 16-a-Hydroxypregnenolone, 19,20-DiHDPA, 24-Hydroxycalcitriol, 24-Hydroxycholesterol, 3a,20b-Pregnanediol, 6-Deoxodolichosterone, 7-Dehydrocholesterol, 7-Ketodeoxycholic_acid, alpha-Tocotrienol, Hydroxycholesterol, Hydroxypregnenolone family, Hydroxypregnenolone, Tetracosatetraenoic_acid_(24:4n-6), and Tetrahydrocorticosterone. Amino acids and peptides such as Galactosylhydroxylysine and Asn-Arg-Ala-Ile were also identified. A cluster of UGT and SULT genes, including UGT2A1 and SULT1B1, are located inside this QTL. SLC transporters and G-coupled proteins such as SLC4A4 and COX18 are inside this locus. QTL 1_1 (1:234.37-239.58 Mb) was identified as the second most important loci since it showed association with 80 unique compounds, including terpenoids such as [6]-Gingerdiol_3,5-diacetate, 16-a-Hydroxypregnenolone, 19,20-DiHDPA, 24-Hydroxycalcitriol, Cortexolone, Cortisol, Cortol, Hydroxypregnenolone family, Pregnenolone, Tetrahydrocorticosterone, Tetrahydrocortisone, and Tetrahydrodeoxycorticosterone. A cluster of CYP2 genes (CYP2C6V1, CYP2C7, CYP2C11, CYP2C12, CYP2C13, CYP2C22, and CYP2C24) are inside this locus. Several CYP2 pseudogenes are located within this locus (CYP2C6_V1-ps1, CYP2C6_V1-ps2, CYP2C6_V1-ps2, CYP2C6_V1-ps3, CYP2C6_V1-ps4, and CYP2C6_V1-ps5).
Inside 20_1, a cluster of SLCs and GSTs are found, including SLC19A1, SLC5A4, SLC37A1, GSTT1, GSTT2, GSTT3, and GSTT4. An ABCG1 transporter is harbored by 20_1. This locus was associated with several terpenoids, including 7-a,25-Dihydroxycholesterol, Alisol_A_24-acetate, Barringtogenol_C, beta-Elemonic_acid, Erythrodiol, Oleanolic_acid, Panaxatriol, and Soyasapogenol_A. Terpenoids such as 13'-Carboxy-gamma-tocopherol, 24-Hydroxycalcitriol, Coprocholic_acid, gamma-Tocotrienol, TetraHCA, and Trihydroxycoprostanoic_acid were associated with QTL 6_1. The latter also harbors genes encoding several G-coupled proteins (GPR132, AKT1, PLD4 and NUDT14). The 7_1 locus was associated with terpenoids such as 12-Ketodeoxycholic_acid, 13'-Carboxy-gamma-tocotrienol, 19,20-DiHDPA, 24-Hydroxycalcitriol, Bisnorcholic_acid, Trihydroxy-bile acid and Ubiquinone-4. QTL 7_1 harbors a cluster of CYP2 genes (CYP2D2, CYP2D3, CYP2D4 and CYP2D5) and a CYB5R3 gene. Additionally, SLC25A17 and SULT4A1 genes are also harbored by this locus.
Figure 5. Regional Manhattan plot for 5A. MZ385.3462964_6.300724 (7-dehydrocholesterol_cp1) and QTL:14_1; and 5B. MZ423.3617659_8.007081 ((20S,24E)-20,26-dihydroxy-24-dammaren-3-one_cp14) and QTL:20_1; violin plot for residual (5C) and phenotypic (5D) distribution for the chr14:20,872,972 marker and MZ385.3462964_6.300724, and; violin plot for residual (5E) and phenotypic (5F) distribution for the chr20:12,105,763 polymorphism and MZ423.3617659_8.007081.
A Heterogeneous stock rat population was assessed. X-axes represent the polymorphism genomic location of each phenotype. The Y-axes show -Log10 values reached by each marker. Gene annotations for these genomic regions are presented; r2 estimates between each leading associated polymorphism and remaining markers inside the associated regions are presented. The association threshold was equal to 5.8 (-log10). Ceca was collected from 1,087 individuals and used for untargeted metabolome assessment by liquid chromatography with tandem mass spectrometry (LC-MS/MS).
Other well-characterized genes encoding proteins involved in metabolome composition and abundance identified across all 294 QTLs include Glutathione S-Transferases (GSTs), Hydroxy-Delta-5-Steroid Dehydrogenase (HSD), and Aryl Hydrocarbon Receptor (AHR). Genes encoding GST proteins were identified inside 1_13 (GSTP1 and GSTP3), 20_1 (GSTT2, GSTT4, GSTT1 and GSTT3), 20_2 (GSTT2, GSTT4, GSTT1 and GSTT3), 20_15 (GSTT2, GSTT4, GSTT1 and GSTT3). Metabolites associated with 1_13, 20_1, 20_2 and 20_15 include terpenoids such as 7-a,25-Dihydroxycholesterol, Alisol_A_24-acetate, beta-Elemonic_acid, Erythrodiol, Oleanolic_acid, Panaxatriol and Soyasapogenol_A. Genes and pseudogenes of HSD were identified inside 2_1 (HSD3B5, HSD3B5-ps1, HSD3B2, HSD3B3 and HSD3B1), 5_27 (HSDL2), 7_14 (HSD17B6), 10_3 (HSD17B1), 13_1 (HSD11B1), 13_2 (HSD11B1), 20_7 (HSD17B8). Compounds such as 2-aminobenzylstatine (Alkaloid), L-beta-aspartyl-L-leucine and mesobilirubinogen (amino acids and peptides), and terpenoids such as [12]-shogaol, 12-ketodeoxycholic_acid, 19,20-DiHDPA, 1beta-Hydroxycholic_acid, 3b-hydroxy-5-cholenoic_acid, 6-deoxohomodolichosterone, 7-ketodeoxycholic_acid, cortol, and tetrahydrocortisone were identified as associated with these loci. An AHR gene was identified inside 6_8. Alkaloids, amino acids, and peptides such as 3-methoxytyramine, dimethylidenebutanedioylcarnitine, phenethylamine_glucuronide, physostigmine, and prunasin were associated with QTL 6_8. FPR genes and pseudogenes were harbored inside 1_9 (FPR1, FPR2, FPR2L1, FPR2L2, FPR2L3, FPR3, FPR2-ps4 and FPR2-ps3) and 1_19 (FPR2-ps3), metabolites such as alpha-CEHC (terpenoids) and avocadyne (fatty acids) were associated metabolites with these loci. Multiple G-coupled proteins were also identified across the associated loci.
Conditional analysis for compound association transbands
From all 34 selected QTLs, five loci (1_4, 3_6, 7_14, 14_21, and 19_1) did not show an overrepresentation of any associated conditional polymorphism across compounds; therefore, these QTLs were excluded from further analysis. The remaining 29 association transbands showed overrepresented conditional polymorphisms for associated compounds.
QTL:1_1 presented an overrepresentation of conditionally associated polymorphisms mapping CYP2C22 and SLC35G1. QTL:14_1 showed an overrepresentation of several conditional associated polymorphisms involving a cluster of UGT genes (UGT2B1, UGT2B1, UGT2B10, UGT2B10, UGT2B34L1, UGT2B34L1 and UGT2B35) and SULT1B1. Lanosterol Synthase (LSS) was identified using conditional associated markers for 20_1. Out of 33 conditional associated markers, 25 are located inside the transcriptional unit of LSS. A total of 43 conditionally associated markers for QTL:6_1 were identified, out of which 32 are located inside Nudix Hydrolase 14 (NUDT14). For QTL:7_1, 33 conditionally associated polymorphisms were found and harbored inside genes such as PH Domain Containing Endocytic Trafficking Adaptor 2 (PHETA2), WBP2 N-Terminal Like (WBP2NL), WBP2 N-Terminal Like (WBP2NL), Alpha-N-Acetylgalactosaminidase (NAGA) and Transcription Factor 20 (TCF20). A CYP cluster inside this locus was also recognized.
Conditional association analysis for highly heritable metabolites based on SNP-based h2
Well-characterized genes related to metabolome were again identified, including enzymes and transporters such as CYP, SLC, and UGT families. Other enzymes potentially involved in the regulation of metabolome composition and abundance include 1-Acylglycerol-3-Phosphate O-Acyltransferase 3 (AGPAT3), Methylenetetrahydrofolate Dehydrogenase (NADP+ Dependent) 2 Like (MTHFD2L), N-Alpha-Acetyltransferase 16, NatA Auxiliary Subunit (NAA16), Alpha-N-Acetylgalactosaminidase (NAGA), Phosphodiesterase 6C (PDE6C) and Protein Arginine Methyltransferase 2 (PRMT2) (Table 3). Transporters were also identified, including putative ATPase Phospholipid Transporting 8B4 (ATP8B4), Potassium Voltage-Gated Channel Subfamily H Member 1 (KCNH1), and Potassium Channel Regulator (KCNRG).
Metabolome source
Figure 6A presents the metabolite source involving the microbiome. No metabolite from the host was identified. Metabolites such as Oleic acid (C18H34O2), Cavipetin C (C24H36O4), and N-Docosahexaenoyl GABA (C26H39NO3) were identified as exclusively produced by the bacterial genus Bacteroides.
Figure 6A. Metabolite source involving microbiome results generated by MASST. C24H32O: MZ337.2529192_3.7722428; C11H16O2: MZ163.1117343_2.9526181, 2,2,7,7-Tetramethyl-1,6-dioxaspiro[4.4]nona-3,8-diene: C12H18: MZ163.1479805_4.87998, 1,3-Diisopropylbenzene; C16H24O: MZ233.1898805_5.8556914, 1-(2,6,6-Trimethyl-2-cyclohexen-1-yl)-1,6-heptadien-3-one; C18H28O: MZ261.2209693_4.350094, 3-Gonal; C18H34O2: MZ265.2522674_6.8596663, Oleic acid; C24H36O4: MZ353.2480573_3.6385643, Cavipetin C: C26H39NO3: MZ414.2998006_4.754634, N-Docosahexaenoyl GABA; C28H42O3: MZ409.3099783_4.335406, 13'-Hydroxy-gamma-tocotrienol; C24H41NO3: MZ414.2986345_5.1940455; C16H22: MZ215.1792619_5.115534; C9H14O: MZ156.1383535_3.5557761, 2-Pentylfuran; C24H38O5: MZ371.2590287_3.3729243, 7-Ketodeoxycholic acid; C21H34O4: MZ333.2424734_2.8351197, Tetrahydrocorticosterone; C21H34O3: MZ299.2367075_4.0256233, Tetrahydrodeoxycorticosterone; C24H36O2: MZ357.2774555_4.596212, Tetracosahexaenoic acid; C24H36O3: MZ373.2731367_3.9029639, Cervonoyl ethanolamide; C24H38O3: MZ397.2739865_4.7947216, 3b-Hydroxy-5-cholenoic acid; C24H38O4: MZ408.3107759_3.7668102, 12-Ketodeoxycholic acid; C26H43NO6: MZ430.2947576_4.4947157, Glycocholic acid; C26H43NO6S: MZ480.2773311_3.4665387. 6B. Relationship summary between microbiome (circular nodes) and metabolome (rectangular nodes) based on correlation results. Bacterial phylum and predicted metabolite pathway were used. Red and blue edges represent a positive and a negative relationship, respectively. No causality is assumed, meaning these relationships might have any directionality.
eQTL and sQTL annotation for conditionally associated polymorphisms
From this analysis, chr3:113798157, chr20:12105763 and chr14:20872972 were selected as cis eQTLs for Col6a1, Kcnip3, Lss, Mal, Rufy3, Spatc1l, Sppl2a, Sult1d1 across multiple tissues. chr3:113798157, chr20:12105763 and chr14:20872972 were classified as cis sQTLs for Col6a1, Cops2, Lss, Rufy3, and Trpm7.
Conclusions
Cecum metabolome in HS rats was highly diverse, being terpenoids the most abundant and variable across samples. A high number of metabolites were also identified as having a relatively high genetic control; SNP-based heritability ranged from 6.15±2.74 (3b,12a-Dihydroxy-5a-cholanoic acid_cp12) to 53.76±3.99% (C13H23NO5S_cp2). QTLs 14_1 (17.18-22.78 Mb), 1_1 (234.37-239.58 Mb) and 20_1 (10,64-14,86 Mb) were identified as the most important transbands or genomic regions showing association with a high number of metabolites, including 226, 80 and 62 metabolites, respectively. Cecum metabolome in rats was identified as being controlled by enzymes such as CYP, UGT, and SULT, and transporters such as ABC and SLC, which have been previously identified as able to drive phenotypic differences in intestinal metabolome across species.





