The chronODE framework for modelling multi-omic time series with ordinary differential equations and machine learning

The chronODE framework for modelling multi-omic time series with ordinary differential equations and machine learning

chronODE: modeling time-resolved genomic signals with the logistic function

Given the cooperative and saturating nature of genomic signals (Fig. 1), we propose modeling transcriptomic and epigenomic kinetics over time using the logistic function, previously applied in fields like bacterial growth31,32. In this context, the signal of a genomic locus (e.g., gene expression, chromatin accessibility, or histone modification) is defined as a time-dependent positive variable z, constrained within the interval [ab]. The rate of change of z over time t can be studied using the following formula, which corresponds to the generalized form of the logistic ODE (see Supplementary Note 1, Propositions 1 and 5, and Supplementary Fig. 2):

$$\frac{dz}{dt}=k(z-a)\left(1-\frac{z-a}{b-a}\right),$$

(1)

where a and b correspond to the lower and upper asymptotes, respectively, of the logistic curve in the real range of the data.

Fig. 1: Diagram illustrating how the logistic curve can be used to model time-series genomic signals.
figure 1

Left panel: The curve represents a genomic signal (y axis) undergoing a logistic increase over time (x axis), modeled using the simplified form of the logistic ODE (Eq. 2) and with a positive growth rate constant (i.e., k* > 0). Note that the genomic signal y* represents a translated and normalized version of the original genomic signal z. The curve is constrained between a lower asymptote of 0 and an upper asymptote defined by the parameter b* (dashed blue line). tstart and tend represent the initial and final time points, respectively, monitored by the experimental time course. The curve is composed of an acceleration phase followed by a deceleration phase, with the inflection point marking the transition between these phases at tswitch (solid red line). This curve can be used to model changes in genomic signals over time, such as gene expression (e.g., mRNA copy number measured via RNA-seq) or chromatin accessibility (e.g., number of nucleosome-free positions measured via ATAC-seq or DNase-seq). The arrows indicate how the number of mRNA copies and nucleosome-free positions increases during the acceleration and deceleration phases. Right panel: Analogous representation for a signal undergoing a logistic decrease (k* < 0). Elements of this figure were created in BioRender. Gerstein, M. (2025) https://BioRender.com/9ok1440.

However, fitting this general form directly can lead to instability, particularly when the number of experimental time points available is limited33. To mitigate this and provide a simpler and easier-to-understand perspective, we fit the data using a simplified form of the ODE (after appropriate translation and normalization of the data; see Supplementary Note 1, Propositions 6–9, and Supplementary Fig. 2):

$$\frac{d{y}^{ * }}{dt}={k}^{ * }{y}^{ * }\left(1-\frac{{y}^{ * }}{{b}^{ * }}\right),\,{{\mbox{with}}}\,{y}^{ * }({t}_{{{\rm{start}}}})={y}_{{{\rm{start}}}}^{ * }$$

(2)

where tstart represents the initial time point of the experimental time course, and \({y}_{{{\rm{start}}}}^{ * }\) represents the normalized gene expression level or chromatin signal of the genomic locus at tstart.

We first solve Eq. 2 analytically to obtain an explicit formulation for y* (see Supplementary Note 1, Proposition 1, for the derivation of the analytical formula):

$${y}^{ * }(t)= \frac{{b}^{ * }{C}^{ * }{e}^{{k}^{ * }t}}{{b}^{ * }+{C}^{ * }{e}^{{k}^{ * }t}},\quad \,{\mbox{where}}\,\\ {C}^{ * }= \frac{{y}_{{{\rm{start}}}}^{ * }}{\left(1-\frac{{y}_{{{\rm{start}}}}^{ * }}{{b}^{ * }}\right){e}^{{k}^{ * }{t}_{{{\rm{start}}}}}}.$$

(3)

Then, we numerically estimate the two kinetic parameters k* and b* that best fit the data. Specifically, our proposed chronODE package optimizes this solution by repurposing a generic initial value problem solver and constraining the numerical solution to the logistic space, so that parameters k* and b* (and their corresponding k and b values back-transformed to the original range) can be easily interpreted. In this context, b* is a positive number and corresponds to the saturation level of the normalized logistic curve, i.e., the horizontal asymptote that approximates the maximum predicted level of y*, while k* represents the growth/decay rate constant, which dictates how fast the signal ramps up or slows down over time. In the case of increasing (i.e., activated) profiles, k* > 0, whereas decreasing (i.e., repressed) profiles have k* < 0 (Fig. 1).

We describe the logistic curve as consisting of two biologically meaningful phases. First, the acceleration phase (where \(| {y}^{ *\prime }| \) progressively increases over time) models the cooperative aspect of epigenetic and transcriptional processes, which tend to occur more frequently and easily after the initial state10,11,12,13,14,16. Second, the deceleration phase (where \(| {y}^{ *\prime }|\) progressively decreases) models the saturating aspect of these processes (Fig. 1). We define tswitch as the time point that marks the transition between these two phases as follows:

$${t}_{{{\rm{switch}}}}=\frac{1}{{k}^{ * }}ln\left(\frac{{b}^{ * }}{{y}_{{{\rm{start}}}}^{ * }}-1\right)+{t}_{{{\rm{start}}}}.$$

(4)

Equation (1) not only models the kinetics of genomic signals while accounting for the underlying biophysical and biochemical constraints, but it also provides the flexibility to capture diverse patterns beyond a standard logistic curve. In particular, it allows for variability in the behavior of genes or regulatory elements, accommodating patterns such as exponential, linear, or logarithmic-like trajectories, which correspond to the accelerating, switching (inflection point), and decelerating phases of the logistic curve, respectively. In essence, it does not assume that all signals strictly follow a full logistic profile.

We note that this approach is well-suited for modeling genomic signals that exhibit monotonic increases or decreases. While these monotonic fits describe most of the observed transcriptional programs, a small fraction of genes may display peak-like expression profiles over time, such as those involved in circadian clocks, cell cycle regulation, or stress responses34,35,36. Thus, we further extended our methodology by introducing a piecewise fitting approach that can accommodate such profiles by combining two logistic curves (or two portions of them). Although alternative functions (e.g., Gaussian curves) could be employed for these cases, our choice to use logistic functions maintains consistency within the framework and allows for comparable kinetic parameters across both monotonic and piecewise fits. Both approaches are implemented as standalone pipelines, providing a scalable solution for genome-wide applications at both bulk and sc levels.

Kinetic profiling of the developing mouse brain transcriptome reveals a balance between rate and saturation level

We used chronODE to investigate the kinetics of gene expression regulation during mouse brain development. We first analyzed time-series RNA-seq maps generated for three regions of the mouse brain (forebrain, midbrain, and hindbrain) across eight developmental time points, from post-conception (PC) day 10.5 to the first postnatal day (PN) (Supplementary Table 1)27. On average across the three regions, we identified  >12,000 genes differentially expressed (DE) over time (Supplementary Fig. 3A). We applied chronODE on the set of DE genes (defining, for a given gene, tstart = 10.5 and \({y}_{{{\rm{start}}}}^{ * }\) = normalized expression level of the gene at time point 10.5; see “Methods”), and identified an average of 87% monotonic fits, suggesting that the logistic ODE is appropriate for modeling the majority of time-series patterns in our dataset (Supplementary Fig. 3B). Conversely, a small fraction of genes (average of 8%) displaying peak-like patterns were captured using piecewise fits (Supplementary Fig. 3C–F). The remaining  ~5% of genes exhibited complex or variable patterns that were not well-suited for fitting with the current models.

When analyzing the distribution of k and b values across genes and brain regions, we found that the monotonic genes are clustered into three major quadrants based on their k & b combinations: high k & low b (Q1), low k & low b (Q2), and low k & high b (Q3) (Fig. 2A). This distribution exhibits an L-shaped pattern (Pearson’s r correlation = −0.44, p value < 2.2e-16), where high values of both k and b are not observed together (i.e., Q4 is absent). A closer examination revealed that Q1 genes (27%) change expression faster, with expression profiles following the full logistic curve (Fig. 2B). In contrast, Q2 genes (59%) and Q3 genes (14%) align with only portions of the logistic curve and are characterized by slower kinetics. Specifically, Q2 genes approach saturation, while Q3 genes remain far from saturation throughout the observed time frame.

Fig. 2: Kinetic characterization of the developing mouse brain transcriptome across three brain regions.
figure 2

A Distribution of the kinetic parameters k (magnitude expressed in absolute value, y axis) and b (x axis) across all monotonic genes in the three brain regions. Genes are grouped into three quadrants (Q1–Q3) based on k-means clustering of their k & b combinations. No genes are found in Q4. B Lineplot showing, for each set of activated and repressed genes in Q1 through Q3 groups, the average expression (y axis) over time (x axis; PC post-conception). Note that the mean expression levels (expressed in \({\log }_{2}\)(TPMs)) were rescaled to the range 0–100% to allow for comparison across Q1–Q3 groups. The error band corresponds to the standard deviation of the mean rescaled expression level (i.e., mean  ± SD). Q1 activated and repressed genes have an average expression profile that resembles the full logistic curve, whereas Q2 and Q3 genes align with the decelerator and accelerator parts of the curve, respectively. The arrow in Q3 genes indicates that these profiles remain far from reaching the saturation level b. C Examples of brain-related genes whose expression profile follows switcher, decelerator, and accelerator profiles, either increasing (upper panels) or decreasing (lower panels). Colored lines correspond to chronODE’s fitted curve reconstructed using the kinetic parameters a, b, and k (see also Supplementary Fig. 2); gray dots correspond to raw data points. The red vertical line indicates the switching point (tswitch). D For each brain region (x axis), proportion (%) of genes (y axis) characterized by switcher (purple), decelerator (green), and accelerator (orange) profiles. E Number of genes (y axis) that show concordant (dashed black line) and discordant (continuous colored line) kinetic patterns across the three regions (x axis) (s switcher, d decelerator, a accelerator). In this case, we focused on the set of 4204 and 2489 genes that have consistently increasing (upper panel) or decreasing (lower panel) monotonic fits in all three regions, respectively.

Within these slowly saturating Q3 genes, those repressed during brain development are associated with nucleotide biosynthesis and amino acid metabolism—metabolic processes essential for early cell proliferation but downregulated later on during lineage commitment37,38 (Supplementary Fig. 4A). This result is consistent with these genes being already far from saturation by day 10.5 (Fig. 2B). In contrast, Q3 activated genes are enriched for functions related to oxidative phosphorylation and mitochondrial ATP synthesis (Supplementary Fig. 4B). These key components of aerobic respiration become increasingly prominent at later developmental stages, particularly during the peri- and post-natal periods39, aligning with the slower activation and delayed saturation of Q3 activated genes (Fig. 2B).

We performed a similar analysis for genes with peak-like expression profiles, and found that, also in this case, only a small fraction (~8%) corresponds to Q4 genes (Supplementary Fig. 4C–F). Altogether, this suggests that genes cannot undergo rapid expression changes while simultaneously achieving high saturation. Consequently, the expression level of rapidly saturating genes is fundamentally constrained (Q1 genes).

Although most genes follow monotonic expression patterns, our results indicate that not all of them transition from full inactivation to full activation (or vice versa) at the same pace (Fig. 2B), and that these differences may correlate with the timing of when their functions are required. To systematically capture the temporal progression of gene expression kinetics, we developed a classification scheme that directly ties the gene’s kinetic profile, as dictated by the k and b parameters, to the specific time window defined by tstart and tend (in our case, days 10.5 and PN, respectively). This classification identifies the switching time (tswitch) of the gene—the point at which its expression shifts from accelerating to decelerating (see Eq. 4)—and determines whether tswitch occurs before (decelerator), within (switcher), or after (accelerator) the time window monitored by the time-series study (Fig. 2C, Supplementary Fig. 5A, and Supplementary Data 1). This allows us to describe the full signal curve in the context of the time window [tstart, tend] (Supplementary Fig. 5B, C). Switcher profiles were the most abundant among both activated and repressed genes, while only a small fraction of genes were classified as accelerators (Fig. 2D). Moreover, although gene activation and repression were largely conserved, with nearly 100% of the genes showing either always an increasing or always a decreasing trend across the three brain regions, their kinetic patterns varied significantly. In fact, 47% and 40% of activated and repressed genes, respectively, were assigned to a different kinetic category in at least two regions (Fig. 2E).

Overall, chronODE allowed us to decipher the kinetics of gene expression during brain development at the resolution of individual genes, highlighting how changes in gene expression result from a balance between the rate of expression and the extent of saturation.

Cell type emergence and gene essentiality shape gene expression kinetics

After mathematically quantifying the kinetics of gene expression during brain development, we next sought to uncover their biological meaning. Our classification indicates that the kinetic patterns of certain genes differ substantially between the forebrain, midbrain, and hindbrain (Fig. 2E). We hypothesized that these variations could be influenced by the cellular composition of each brain region and the changes in this composition over time. In fact, each region is composed of specific cell types, with specific characteristics, that emerge at different developmental stages, potentially shaping the kinetics of gene expression40. To test this hypothesis, we extended our analysis from tissue-level data to sc resolution. Specifically, we applied our chronODE framework to a recently published time-series scRNA-seq atlas of the developing mouse embryo29. This dataset includes 41 brain-related cell types that emerge at various time points between embryonic days E8.5 and E14.25. For our subsequent analyses we focused specifically on activated genes, since they were detected across all cell types, in contrast to repressed genes which were mostly restricted to cell types appearing between E8.5 and E9 (Supplementary Fig. 6A).

Genes with distinct kinetic behaviors, as defined by the three quadrants in Fig. 2A, were observed also across most individual cell types, consistent with the diversity of gene expression kinetics identified in the bulk tissue analysis (Supplementary Fig. 6B). However, the L-shaped distribution of these three kinetic subpopulations changed markedly over time as new cell types emerged. Stratifying cell types into five temporal groups revealed that genes become progressively constrained to a smaller range of kb combinations at later developmental time points (Fig. 3A). In other words, late cell types, such as amacrine precursors and cholinergic amacrine cells, show more constrained kinetic patterns of gene activation, compared to early cell types (e.g., neural progenitors and motor neurons) (Supplementary Fig. 6B). Moreover, when taking a closer look at their temporal progression, we found that genes activated in early-appearing cell types predominantly followed decelerating profiles and were already close to their saturation level b by day E8.5 (Fig. 3B). In contrast, genes activated in late-emerging cell types displayed mostly switching profiles and, to a lesser extent, accelerating profiles. These findings confirmed our initial hypothesis, suggesting that the timing of cell type appearance seems to play a critical role in shaping gene expression kinetics.

Fig. 3: Cell type specificity of gene expression kinetics.
figure 3

A Distribution of the kinetic parameters k (y axis) and b (x axis) for activated genes (k > 0) across 41 brain cell types. Cell types have been grouped based on their day of appearance (between 8 and 9, 9 and 10, 10 and 11, 12 and 13, and 14and 15 embryonic [E] days). B For each cell type (x axis) within these five groups, proportion of genes (%; y axis) that are characterized by switcher (purple), decelerator (green), and accelerator (orange) profiles. C Distribution of the kinetic parameter k (y axis) for essential (red) and non-essential (gray) genes (x axis). Box plots present the median as the center, 25% and 75% percentiles as box limits, and whiskers extending to the largest and smallest values within the 1.5 × interquartile range (IQR) of the box limits. We applied the Wilcoxon Rank-Sum test (two-sided, unpaired) to compute significant differences (***: p value  < 0.001 [p value = 4.2E-203]; n = 54,212; we note these are unreplicated single-cell data from Qiu et al.29). D Proportion (%; y axis) of essential (Ess.) and non-essential (Non-Ess.) genes (x axis) belonging to the three kinetic classes (color-coded as in B).

Still, among genes activated in the early cell types, we observed a fraction following slower, accelerating profiles (average of 11% in E8-E9 cell types; Fig. 3B). This suggests that even though these cells are present from early stages of development, they selectively express certain genes at slower rates. This observation prompted us to investigate additional factors that might contribute to the regulation of gene expression kinetics. In particular, we hypothesized that a cell may tend to prioritize the expression of genes that are essential for its survival. To this end, we integrated in our sc analyses a recently published catalog of essential genes41. We observed that, especially in the earliest cell types (E8-E9 appearance days), essential genes tend to ramp up faster (higher k values) and are closer to their saturation levels (i.e., higher proportion of decelerator and switcher profiles) compared to non-essential genes (Fig. 3C, D). Thus, our kinetic analysis provides an alternative perspective: the essentiality of a gene for cell viability likely influences its expression kinetics, determining how rapidly it is expressed.

Genes linked to poly-pattern cCREs are more dynamic during brain development and are enriched in brain-specific functions

Besides estimating gene expression kinetics, chronODE is also suitable for studying the kinetics of chromatin activity at regulatory elements in the genome, providing insights into gene regulatory mechanisms. To this end, we also analyzed the kinetics of chromatin accessibility at candidate cis-regulatory elements (cCREs) from the ENCODE project42. Specifically, we applied chronODE to time-series DNase-seq and ATAC-seq maps28 available for the same time points and brain regions as the bulk RNA-seq data (Supplementary Tables 2 and 3). We identified cCREs with both increasing (k > 0) and decreasing (k < 0) trends, and classified their kinetics in each of the three regions (Supplementary Data 1). We then investigated how activated and repressed cCREs relate to the expression of target genes over time. To achieve this, we employed a simple proximity criterion to pair mouse cCREs with target genes by linking each cCRE to its nearest gene based on linear distance43,44. Although this is a simplified approach, we note that our framework could also accommodate more complex linking methods, such as those recently applied by ENCODE in the context of the human genome45. Our analysis revealed that genes are not only linked to multiple cCREs, as previously reported44, but that these cCREs also exhibit distinct regulatory roles. Specifically, on average 50% of genes are regulated by both activated (k > 0) and repressed (k < 0) cCREs (Supplementary Fig. 7A). We refer to these genes, which display mixed cCRE trends, as poly-pattern (poly-p.) genes (Fig. 4A). In contrast, mono-pattern (mono-p.) genes are exclusively associated with a single type of cCRE trend—either all activated or all repressed.

Fig. 4: Genes linked to poly-pattern cCREs are more dynamic during brain development and are enriched in brain-specific functions.
figure 4

A Schematic illustrating the distinction between mono-pattern and poly-pattern genes. Mono-pattern genes (left panel) can be linked to one or more cCREs, but if associated with multiple cCREs, all of these cCREs exhibit the same regulatory profile—either all activated (k > 0, red, upper panel) or all repressed (k < 0, blue, lower panel). Genes linked to a single cCRE represent the simplest case of mono-pattern regulation. In contrast, poly-pattern genes (right panel) are associated with multiple cCREs, which may be predominantly activated (upper panel) or repressed (lower panel). B Distributions of gene expression fold-change (\({\log }_{2}\)(TPM); y axis) for mono-pattern (mono-p.) and poly-pattern (poly-p.) genes (x axis) across the three brain regions. Box plots present the median as the center, 25% and 75% percentiles as box limits, and whiskers extending to the largest and smallest values within the 1.5  × IQR of the box limits. We applied the Wilcoxon Rank-Sum test (two-sided, unpaired) to compute significant differences (***: p value  < 0.001). Forebrain: p value = 7.6E-33 (n = 7992; 2 biological replicates); Midbrain: p value = 1.5E-5 (n = 5999; 2 biological replicates); Hindbrain: p value = 9.5E-16 (n = 6772; 2 biological replicates). C Proportion of distal cCREs (y axis) linked to mono- and poly-pattern genes (x axis). Distal cCREs are defined as those located  >2 Kb from the nearest annotated transcription start site of the gene. We report mean proportion and standard deviation across the three brain regions (n = 3). Overlayed black dots indicate the proportion of distal cCREs in each of the three regions. D Gene Ontology biological process terms (y axis) enriched among mono- and poly-pattern genes. Significance of GO terms’ enrichment was determined with a hypergeometric test (two-sided). Only GO terms reporting a Benjamini-Hochberg adjusted p value < 0.01 were considered. Among these, we show in the panel the 15 most significant GO terms with the corresponding −log10 p value (x axis).

Across all three brain regions, genes with poly-pattern regulation were linked to more distal cCREs and exhibited larger expression changes over time compared to mono-pattern genes (Fig. 4B, C). This suggests that poly-pattern genes play a more significant role in brain development, as they are enriched in neural and brain-specific functions, including neurogenesis, trans-synaptic signaling, and sensory system development (Fig. 4D). In contrast, mono-pattern genes were predominantly associated with housekeeping functions such as gene expression and compound biosynthesis. A central question is what drives the highly dynamic behavior and specialized functions of poly-pattern genes: is it the number of regulatory elements or the diversity of their regulatory trends? To address this, we disentangled the contributions of cCRE count and trend diversity (mono- vs. poly-pattern regulation). Our analysis demonstrated that both the number of cCREs and the diversity of their trends significantly correlate with a gene’s dynamic expression profile and its involvement in brain-specific functions (Supplementary Fig. 7B, C). This finding expands on our conclusion that gene expression is fundamentally a constrained process by having a balanced regulation involving both activation and repression mechanisms.

Overall, these results suggest that the precise expression of genes important for brain development may be governed by a more complex network involving regulatory elements with mixed trends. In contrast, those genes associated with non-brain-specific functions may rely on a simpler regulome, potentially reflecting a differential degree of control based on the biological significance and impact of these genes.

Predicting time-series gene expression from chromatin signals of associated cCREs

The insights gained from our previous analyses improve our understanding of how gene expression evolves over time, particularly in terms of expression kinetics and the impact of epigenetic changes at regulatory elements. This knowledge can be applied to develop temporal models that predict gene expression at future time points, accounting for the additive effect of multiple cCREs associated with a given gene44. Beyond prediction, incorporating chromatin regulation into these models can also provide insights into the complex interplay between transcriptome and epigenome.

Towards this goal, the first step in our analysis was to examine the correlation, over time, between gene expression and chromatin accessibility at associated cCREs. We observed a bimodal distribution, with 66% of the gene-cCRE pairs showing positive correlation and 34% showing negative correlation (Supplementary Fig. 8A). In this context, we make the simple assumption that when a cCRE is open, TFs can access the gene, facilitating expression. In contrast, a closed cCRE restricts TF access and potentially reduces gene activity. Based on this assumption, the positively correlated cCRE-gene pairs were interpreted as cases where the cCRE acts as an enhancer of gene expression, since the accessibility of the cCRE aligns with the direction of gene expression. In contrast, the negatively correlated cCRE-gene pairs were interpreted as cases where the cCRE exerts a silencer effect on gene expression. This information is crucial for determining the direction of gene expression changes (activation or repression).

On the other hand, we found that the kinetics of gene expression were correlated with the diversity of cCRE regulation associated with the gene (mono-p. vs. poly-p.; Supplementary Fig. 8B). Specifically, activated genes with decelerator profiles were more likely to undergo poly-pattern regulation compared to switchers and accelerators, while the opposite was observed for repressed genes. This suggests that the type of epigenetic regulation a gene undergoes plays a key role in shaping the kinetics of its expression profile, making it important to incorporate this information into the predictive model.

Building on these findings and utilizing our many-to-one linking schema—where multiple cCREs are linked to a given gene over time—we designed a model architecture (Fig. 5A) capturing four types of gene regulatory mechanisms that differ in the direction (enhancer/silencer) and diversity (mono-p./poly-p.) of cCRE regulation as follows: (1) mono-pattern genes subject to enhancer regulation (enhancer mono-p.), where all cCRE signals linked to a gene are positively correlated with its expression; (2) mono-pattern genes subject to silencer regulation (silencer mono-p.), where all cCRE signals linked to the gene are negatively correlated with its expression; (3) poly-pattern genes subject to enhancer regulation, where the majority of cCRE signals are positively correlated with gene expression, while a minor fraction exhibit silencer-like effects (enhancer poly-p.); and (4) poly-pattern genes subject to silencer regulation, where most cCRE signals are negatively correlated with gene expression, with a smaller proportion displaying enhancer-like effects (silencer poly-p.). This approach enabled us to train our model towards predicting both the direction and kinetic patterns of gene expression over time. Each subset of genes was used to train the model independently, with an 80:20 split for training and testing, respectively.

Fig. 5: Predicting time-series gene expression from chromatin signals of associated cCREs.
figure 5

A Architecture of the biRNN-based model. At each time point, the input for a specific multicCRE-gene pair i is a vector c of dimension 1 × m, where m represents the number of cCRE signals associated with the gene. The model generates an output vector, representing the gene expression levels g across all time points. B Density plot showing cross-gene expression correlation for each of the four regulatory mechanisms. Each scatterplot shows true (x axis) versus predicted (y axis) expression values (\({\log }_{2}\)-transformed TPMs) across all genes in the test set, alongside their degree of correlation as measured by Pearson’s correlation coefficient (r) and the corresponding Benjamini–Hochberg adjusted p value. C Representative examples of gene expression predictions over time demonstrating our model’s capability of capturing the three kinetic patterns (switchers, decelerators, and accelerators). True and predicted expression values are color-coded. Expression values correspond to \({\log }_{2}\)-transformed TPMs. D Barplot showing, for each of the four regulatory mechanisms, the relative feature importance (computed using the SHAP method; y axis), for different cCREs associated with genes in the test set (x axis). The x axis represents the proximity rank of each cCRE to its associated gene, with rank 1 indicating the closest cCRE. SHAP importance values are shown for each of the eight time points (color-coded). The analysis highlights that cCREs closer to the gene have higher predictive importance at the initial and final time points, while more distant cCREs show greater importance at intermediate time points.

Our modeling approach predicts gene expression at each time point through regression, using a bidirectional RNN-based architecture, which is well-suited for time-series data. Given a multicCRE-gene pair i, we define the complete cCRE time-series input signal associated with a given gene as the matrix Ci, such that:

$${{{\bf{C}}}}_{i}=[{{{\bf{c}}}}_{i,0},{{{\bf{c}}}}_{i,1},\ldots,{{{\bf{c}}}}_{i,t}].$$

(5)

At each time point t (t {1, 2, …, 8}) we define ci,t as the vector of m cCRE signals:

$${{{\bf{c}}}}_{i,t}=[{c}_{i,t,1},{c}_{i,t,2},\ldots,{c}_{i,t,m}].$$

(6)

Ci is the input used to predict gi, i.e., the gene expression profile over time:

$${{{\bf{g}}}}_{i}=[{g}_{i,0},{g}_{i,1},\ldots,{g}_{i,t}].$$

(7)

We use a biRNN to model gene expression gi,t at a particular time point t, as a function of the cCRE chromatin accessibility signal vector ci,t. In fact, capturing the interplay between chromatin and gene expression is not a trivial task, since multiple scenarios are possible. For instance, changes in chromatin accessibility at cCREs upstream of the promoter can potentially influence gene expression, for example, by enabling TF binding or facilitating Pol II release. Conversely, transcriptional activity—through Pol II elongation—can also induce changes in chromatin accessibility at both upstream and downstream cCREs. The biRNN allows us to cover this wide range of scenarios by leveraging chromatin signals at both past and future time points to make predictions of gene expression. Specifically, we define the following relation:

$${g}_{i,t}={{\mathcal{D}}}(\,{\mbox{biRNN}}\,({{{\bf{c}}}}_{i,t})),$$

(8)

where \({{\mathcal{D}}}\) represents a dense layer that takes as input the output of the biRNN.

This modeling approach effectively handles the highly dimensional nature of the omics data and enables accurate regression of gene expression levels exhibiting diverse kinetic patterns over time. We note that although here we focused exclusively on chromatin accessibility as input data, our modeling architecture is versatile and can potentially incorporate additional cCRE chromatin features (such as histone post-translational modifications) at each time point t in the form of a tensor.

To evaluate the performance of the model, we first computed cross-gene correlations between true and predicted expression values. Our predictions on the test set showed a strong positive correlation (r > 0.85) in all four types of regulatory mechanisms (Fig. 5B), higher compared to previous reports of chromatin-gene expression predictions46. We also evaluated the model performance by computing the distribution of mean squared error (MSE) and cross-timepoint correlation between true and predicted gene expression values (Supplementary Fig. 8C). In particular, we achieved mean cross-timepoint correlations ranging between 0.57 and 0.90, higher than equivalent cross-cell type correlations reported previously46. To further showcase the predictive capability of our model, we also included some examples of true vs. predicted gene expression time-series profiles (Fig. 5C). We note that the usage of the biRNN is especially effective for predicting the expression of poly-pattern genes, which are particularly challenging to model due to the mixed contributions of enhancer and silencer cCRE signals (Supplementary Fig. 8D). The biRNN leverages the more comprehensive learning context provided by both past and future time points to address this complexity. Additionally, incorporating a dense layer enhances the model’s capacity to learn complex kinetic relationships between cCREs and their associated genes.

Finally, we investigated how the distance between cCREs and their target genes influences gene expression predictions using the SHapley Additive exPlanations (SHAP) method47. Specifically, we computed the average importance of each cCRE across all the samples in the test set. Our analysis revealed that closer cCREs have a greater impact on predicting gene expression, underscoring their biological importance in gene regulation, consistent with previous findings43. Notably, these proximal cCREs play a dominant role in predictions at the first and last time points, whereas middle-to-distant cCREs contribute more to the predictions at intermediate time points, thus directly influencing the kinetic patterns of the gene expression signals (Fig. 5D). This finding indicates that multiple cCREs provide the flexibility needed to capture complex kinetic patterns of gene expression. Altogether, this suggests that, during brain development, gene expression kinetics are shaped by the cumulative effects of numerous cCREs, especially the more distal ones.

link

Leave a Reply

Your email address will not be published. Required fields are marked *