Data Science


The Data Science Group was established in 2019 and focuses on a) methodological research questions related to data science and algorithmic artificial intelligence, b) foundational questions regarding the best achievable performance limits in various tasks of data modeling, analysis and inference, and c) interdisciplinary research questions from a wide variety of research domains, including Social Sciences, Geo-Sciences, Bioinformatics and Biomedical Engineering, Environmental and Transportation Engineering, among others. Current activity areas include frequentist and Bayesian spatio-temporal modelling, the development of information-theoretic tools, high-dimensional and functional data analysis, time-series forecasting and online predictive and anomaly detection algorithms. Researchers from the Data Science Group participate in the Statistical Learning Lab, which fosters multidisciplinary collaborations, develops software products, research publications and patents and contributes to the education and training of students and young researchers.  


  • Deep Learning and Generative AI

    Deep learning: Deep learning has recently been at the forefront of artificial intelligence (AI) research and has truly revolutionized several fields of AI. Data Science Group members are actively involved in developing methods that can help us both better understand the limitations of existing deep learning approaches and also improve various aspects of their performance. They have worked on a wide variety of topics from this area, exploring both fundamental questions and practical applications. This includes, to mention a few of these topics, exploring and proposing novel deep network architectures, developing new ways of effectively transferring knowledge between networks, revisiting weight parameterizations for deep networks in order to improve their generalization capabilities, proposing novel self-supervised and few-shot learning methods, devising and applying learning approaches that advance the state-of-the-art for fundamental problems from the areas of computer vision and image analysis, making use of attention in the context of knowledge distillation, proposing hybrid (scattering-based) convolutional network architectures that allow for better representation learning and more interpretable features, as well as properly adapting deep neural networks such that they can be applied to arbitrary graph-structured data directly and can also handle structured-prediction tasks.

    Neural-based Speech Synthesis - Deep Learning in Speech Processing: Deep Neural Networks have taken the engineering community by storm. Data-rich areas such as image processing and speech processing have been transformed during the last years. The Data Science Group combines its expertise on speech processing and applies deep learning techniques to applications such as voice conversion, speech synthesis and speech enhancement.

    Generative Adversarial Nets - Deep Generative Learning:  Generative models based on Deep Neural Nets have shown unprecedented capabilities in sampling data from complex but unknown distributions. Researchers in the Data Science Group develop novel algorithms for training generative models, focusing on Generative Adversarial Networks (GANs). GANs have been used in data augmentation schemes to generate synthetic data that follow the same distributional characteristics as the original dataset (which may contain sensitive information or limited number of cases). The proposed methodology has been applied to identify dyslexia in children, using measurements from specialized eye trackers.

  • Spatial, Temporal & Spatio-temporal Statistics
  • High-Dimensional and Sparse Statistics
  • Uncertainty Quantification
  • Additional Topics
  • Education and Training

Data Science



  • Title: NEMO-Tools: Next-generation monitoring and mapping tools to assess marine ecosystems and biodiversity
    Funding Source: Hellenic Foundation for Research and Innovation
    Duration: 2024-2026
  • Title: Disentangled representation learning via Mutual Information optimization with applications in speech representation learning
    Funding Source: Private sector
    Duration: 2024
  • Title: STOMA: Towards real-time, enhanced text-to-speech synthesis on the device
    Funding Source: Hellenic Foundation for Research and Innovation
    Duration: 2022-2024
  • Title: FUSING: Biophysical tools FUSed via integrative computational approaches to decode protein foldING
    Funding Source: FORTH Synergy
    Duration: 2022-2024


  • Title: Data Landscaping: Traffic and Mobility Data Sources of Official Statistics
    Funding Source: Eurostat
    Duration: 2022
  • Title: SOLAR-P: Evaluation of alternative solar panel technologies, computation of irradiance daily profiles
    Funding source: Saudi Aramco and KAUST
    Duration: 2021-2022
  • Title: Characterising population dynamics with applications in biological data
    Funding source: ESPA - Department of Development
    Duration: 2020-2021
  • Title: WNRG: Forecasting hourly wind-farm outputs based on wind-speed predictions from alternative providers
    Funding source: EREN-Hellas
    Duration: 2020
  • Title: ENRICH: Enriched communication across the lifespan
    Horizon 2020, MSCA-ETN-2020
    Duration: 2017-2020


  • 2024

    • K Biza, A Ntroumpogiannis, S Triantafillou, I Tsamardinos (2024) Towards Automated Causal Discovery: a case study on 5G telecommunication data, arXiv preprint arXiv:2402.14481
    • A Chatzimentor, A Doxa, M Butenschön, T Kristiansen, MA Peck, S Katsanevakis, AD Mazaris (2024) Diving into warming oceans: Assessing 3D climatically suitable foraging areas of loggerhead sea turtles under climate change, Journal for Nature Conservation, 126620.
    • K Ellrott, CK Wong, C Yau, MA Castro, J Lee, B Karlberg, JK Grewal, V Lagani, B Tercan, V Friedl, T Hinoue, V Uzunangelov, L Westlake, X Loinaz, I Felau, P Wang, A Kemal, SJ Caesar-Johnson, I Shmulevich, AJ Lazar, I Tsamardinos, KA Hoadley, Cancer Genome Atlas Analysis Network, GA Robertson, TA Knijnenburg, CC Benz, JM Stuart, JC Zenklusen, AD Cherniack, PW Laird (2024) Leveraging compact feature sets for TCGA-based molecular subtype classification on new samples, Cancer Research 84 (6_Supplement), 6548-6548.
    • J Fagin, G Vernardos, G Tsagkatakis, Y Pantazis, AJ Shajib, M O'Dowd (2024) Measuring the Substructure Mass Power Spectrum of 23 SLACS Strong Galaxy-Galaxy Lenses with Convolutional Neural Networks, arXiv preprint arXiv:2403.13881
    • A Fayomi, Y Pantazis, M Tsagris, ATA Wood (2024) Cauchy robust principal component analysis with applications to high-dimensional data sets, Statistics and Computing 34 (1), 1-14.
    • F Goudarzi, A Doxa, MR Hemami, AD Mazaris (2024) Thermal vulnerability of sea turtle foraging grounds around the globe, Communications Biology 7 (1), 347.
    • Y Kamarianakis, Y Pantazis, E Kalligiannaki, TD Katsaounis, K Kotsovos, I Gereige, M Abdullah, A Jamal, A Tzavaras (2024) Robust day-ahead solar forecasting with endogenous data and sliding windows, J. Renewable Sustainable Energy 16 (026103), 1-13.
    • OTD Nguyen, I Fotopoulos, M Markaki, I Tsamardinos, V Lagani, OD Røe (2024) Improving Lung Cancer Screening Selection: The HUNT Lung Cancer Risk Model for Ever-Smokers Versus the NELSON and 2021 United States Preventive Services Task Force Criteria in the Cohort of Norway: A Population-Based Prospective Study, JTO Clinical and Research Reports 5 (4), 100660.
    • G Paterakis, S Fafalios, P Charonyktakis, V Christophides, I Tsamardinos (2024) Do we really need imputation in AutoML predictive modeling? ACM Transactions on Knowledge Discovery from Data
    • MA Zervou, E Doutsi, Y Pantazis, P Tsakalides (2024) Multitask Classification of Antimicrobial Peptides for Simultaneous Assessment of Antimicrobial Property and Structural Fold, ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1836-1840.

  • 2023
  • 2022
  • 2021
  • 2020
  • 2019


  •       Biza Konstantina (PhD candidate)
  •       Georgoulis Elias (MSc candidate)
  •       Kofidis Andreas (MSc candidate)
  •       Litsas Anastasios (MSc candidate)
  •       Papadaki Maria-Eleni (MSc candidate)
  •       Raptakis Michail (PhD candidate)


For any information regarding the group please contact:

Data Science Group,
Institute of Applied and Computational Mathematics,
Foundation for Research and Technology - Hellas
Nikolaou Plastira 100, Vassilika Vouton,
GR 700 13 Heraklion, Crete

Tel: +30 2810 391800
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it. (Mrs. Maria Papadaki)

Tel.: +30 2810 391805
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it. (Mrs. Yiota Rigopoulou)