Research
Geosciences and Topological Data Analysis
Topological Data Analysis for Earth Systems Modeling

Description: Reactive fluid flow in Earth systems and subsurface energy and carbon management technologies is governed by complex, multiscale nonlinear dynamics. Predictive modeling of these systems remains challenging, however, due to significant discrepancies between observational, experimental, numerical, and in-silico data. One potential source of this mismatch stems from the critical influence of rock texture – the spatial distribution of minerals dictating fluid-mineral interactions along flow paths. Statistically characterizing texture in multi-mineralic rocks is especially difficult, typically relying on manual methods even for 2D images. This study introduces a novel, computationally efficient framework combining Topological Data Analysis (TDA) and Bayesian statistics for multi-scale spatial texture characterization at the pixel level. Utilizing TDA, we quantify the scale and topology of mineral connectivity across all spatial resolutions within an image, generating rotation- and translation-invariant persistence diagrams that capture the global texture pattern. We then apply Bayesian models to statistically characterize the geometric properties (e.g., area, perimeter, orientation, color) of individual connected mineral components identified during this multi-scale TDA filtration (binary dilation) process. Crucially, our framework tracks how these properties evolve across scales, pinpointing the specific resolutions/scales where homogenization occurs. Overall, our integrated TDA-Bayesian approach has three key features : (1) Statistically robust, automated texture characterization on images utilizing images ranging from outcrop (m) scale to micro-analytical methods like SEM, EPMA (mm scale) ; (2) Inherent multi-scale analysis revealing the relevant averaging scales and mineral distribution relationships critical for reactive flow; and (3) Optimized numerical speed using optimized C code for enabling rapid analysis of large, pixel-scale gridded images. We demonstrate the framework’s utility on geologically realistic synthetic microstructures and natural rock thin sections, extracting a parsimonious set of multi-scale statistical features most relevant for predicting reactive flow behavior.
Work with Benjamin Roycraft, Tushar Mittal, and Padma Tanikella
DOI: To come!
Climate and Environmental Science, Spatial Statistics, and Causality
Spatio-Temporal Stochastic Interventions for Causal Inference in Climate Science

Description: While physical understanding predicts a causal relationship between greenhouse gas emissions and warming in the global climate, estimation of the exact magnitude of this causal effect is notoriously difficult to constrain. One of the reasons for this high degree of uncertainty is that the climate system’s overall sensitivity depends on how the spatial pattern of temperature changes causally affects outgoing temperature. While climate model simulations provide dynamically informed estimates, performing inference on the observations is challenging due to the lack of suitable counterfactuals and the high-dimensional nature of the global climate system. We propose to address these difficulties through the causal inference framework of stochastic interventions, where the interventions are modeled as continuous spatial Gaussian processes on the domain. Representing the interventions a spatial stochastic process allows for the causal effects to be consistently estimated from the limited observational record. Using a Bayesian framework, prior information in the form of climate model simulations is incorporated into the form of the stochastic interventions in order to relax the underlying assumptions with physical information. The robustness of the results are assessed through sensitivity analyses and validation studies using climate models.
DOI: To come!
Causal Discovery of Meteorological Dynamics in Mobile Tower Data

Description: Recent advances in causal inference use Structural Causal Modeling (SCM) to infer the direction of cause-effect relationships. Our research uses SCM to identify causal relationships in meteorological processes at the surface level, and develop visualizations for a broader audience. We studied interactions among rain, temperature, specific humidity, aerosols, and CO\(_2\). Data was obtained from the CoURAGE project, consisting of three (3) mobile stations located in the urban, rural, and bay area of Baltimore. We use 1 minute interval data for analysis of the relationship among the variables of rain events and only data that contains precipitation. For the relation with CO\(_2\), we use 30-minute intervals, separating data by daytime and nighttime. We perform a time series analysis using the Tigramite python package. With these causal tools, we perform Causal Discovery to estimate the structure of the SCM. We then use Wright’s path estimation to find the causal effects for a given SCM using the ‘do calculus’. Our findings suggest that specific humidity has a stronger causal relation with temperature and aerosols than with precipitation. Additionally, CO\(_2\) tends to have a negative relation with temperature.
Work with Chris Forest, Alondra Alvarez-Castro, and MeiLi Haan
DOI: To come!
Climate Change Detection and Attribution

Description: Regression-based optimal fingerprinting techniques for climate change detection and attribution require the estimation of the forced signal as well as the internal variability covariance matrix in order to distinguish between their influences in the observational record. While previously developed approaches have taken into account the uncertainty linked to the estimation of the forced signal, there has been less focus on uncertainty in the covariance matrix describing natural variability, despite the fact that the specification of this covariance matrix is known to meaningfully impact the results. Here we propose a Bayesian optimal fingerprinting framework using a Laplacian basis function parameterization of the covariance matrix. This parameterization, unlike traditional methods based on principal components, does not require the basis vectors themselves to be estimated from climate model data, which allows for the uncertainty in estimating the covariance structure to be propagated to the optimal fingerprinting regression parameter. We show through a CMIP6 validation study that this proposed approach achieves better-calibrated coverage rates of the true regression parameter than principal component-based approaches. When applied to HadCRUT observational data, the proposed approach detects anthropogenic warming with higher confidence levels, and with lower variability over the choice of climate models, than principal-component-based approaches.
Work with Karen McKinnon
DOI: https://doi.org/10.48550/arXiv.2208.02919
Bayesian Hierarchical Models for Ocean Heat Content Modeling

Description: The accurate quantification of changes in the heat content of the world’s oceans is crucial for our understanding of the effects of increasing greenhouse gas concentrations. The Argo program, consisting of Lagrangian floats that measure vertical temperature profiles throughout the global ocean, has provided a wealth of data from which to estimate ocean heat content. However, creating a globally consistent statistical model for ocean heat content remains challenging due to the need for a globally valid covariance model that can capture complex nonstationarity. In this paper, we develop a hierarchical Bayesian Gaussian process model that uses kernel convolutions with cylindrical distances to allow for spatial nonstationarity in all model parameters while using a Vecchia process to remain computationally feasible for large spatial datasets. Our approach can produce valid credible intervals for globally integrated quantities that would not be possible using previous approaches. These advantages are demonstrated through the application of the model to Argo data, yielding credible intervals for the spatially varying trend in ocean heat content that accounts for both the uncertainty induced from interpolation and from estimating the mean field and other parameters. Through cross-validation, we show that our model outperforms an out-of-the-box approach as well as other simpler models. The code for performing this analysis is provided as the R package BayesianOHC.

