Data Science Group members are actively involved in the development of Bayesian and frequentist models for spatio-temporal data, including models with spatially varying coefficients. Conditional autoregressive priors, Moran’s eigenvector filtering, dimension reduction algorithms, penalized LASSO-type estimates and bootstrap-based uncertainty quantification are among the methodological tools that are used to estimate parameters and conduct inference on such models. Typical applications of space-time models in Geosciences include a) the evaluation of outputs from Regional Climate Models and b) the analysis of time series of remotely sensed imagery. Similar models have been developed to analyze the dynamics of regional economic variables.
Mathematics of information: Information is a core notion in many engineering and scientific disciplines. For example, much of modern statistics may be characterised as the process of extracting information from data. Over the past 60 years, information, along with its mathematical description via the Boltzmann-Shannon entropy, have played a crucial role in science and technology, both as a central metaphors providing intellectual guidelines, as well as a mathematically specific, technical, and precisely measurable quantities. Members of the Data Science group have been exploring the development of rigorous tools for the mathematical description of information, as well as the analysis of practical questions arising in a variety of applications, ranging from moder digital communications networks to finance, neuroscience and bioinformatics.
Incident/Anomaly Detection : Group members develop real-time algorithms for incident/anomaly detection, focusing mostly on network activity (e.g. vehicular networks). Decision trees and (nonparametric) quantile regressions are among the methodological tools that have been employed as key components of such algorithms. Typical applications include incident detection on vehicular networks based on loop detector data, fraud detection in bank transactions, etc.
Functional Data Analysis: Functional data analysis (FDA) focuses on data that can be curves, surfaces or anything else varying over a continuum. For instance, plasma thermograms are curves associated with a person’s health status; a group of such curves can be analyzed using FDA-analogues of conventional statistical procedures. The Data Science group develops segment-wise supervised classification schemes for multivariate functional data, which may reject noisy domains of the functional data and assign larger weights to segments that contain useful information for the classification groups of each study. A typical application of the proposed algorithms is disease identification/diagnosis.
Forecasting Multivariate Time Series: The group’s activities include research on linear (ARIMA) and nonlinear (smooth-transition or threshold autoregressions) parametric time series models that are estimated using a) fast penalized LASSO-type estimators or, b) Bayesian posteriors based on shrinkage priors (e.g. horseshoe). Such models are used in (real-time) forecasting applications, including for example, short-term forecasts of a) vehicular counts in a transportation network, b) energy outputs from wind- or solar-panel farms and c) emissions from heavy-duty diesel engines.
Deep Generative Learning: Generative models based on Deep Neural Nets have shown unprecedented capabilities in sampling data from complex but unknown distributions. Researchers in the Data Science Group develop novel algorithms for training generative models, focusing on Generative Adversarial Networks (GANs). GANs have been used in data augmentation schemes to generate synthetic data that follow the same distributional characteristics as the original dataset (which may contain sensitive information or limited number of cases). The proposed methodology has been applied to identify dyslexia in children, using measurements from specialized eye trackers.
Uncertainty Quantification in Stochastic Systems and Multilevel Models: Uncertainty quantification (UQ) is essential for the reliable and robust modelling of complex stochastic dynamics. The Data Science Group develops novel information-theoretic tools to address the challenges concerning the feasibility and efficiency of UQ in stochastic systems, which were used to analyze biological reaction networks. Furthermore, the Data Science group members conduct research on computationally intensive methods (e.g. bootstrap-based confidence intervals) for uncertainty quantification in models that capture nested data structures (multilevel models for categorical responses).
A. ONGOING PROJECTS