Data Science

ABOUT

The Data Science Group was established in 2019 and focuses on a) methodological research questions related to data science and algorithmic artificial intelligence, b) foundational questions regarding the best achievable performance limits in various tasks of data modeling, analysis and inference, and c) interdisciplinary research questions from a wide variety of research domains, including Social Sciences, Geo-Sciences, Bioinformatics and Biomedical Engineering, Environmental and Transportation Engineering, among others. Current activity areas include frequentist and Bayesian spatio-temporal modelling, the development of information-theoretic tools, high-dimensional and functional data analysis, time-series forecasting and online predictive and anomaly detection algorithms. Researchers from the Data Science Group participate in the Statistical Learning Lab, which fosters multidisciplinary collaborations, develops software products, research publications and patents and contributes to the education and training of students and young researchers.  

RESEARCH AND DEVELOPMENT ACTIVITIES

  • Deep Learning and Generative AI

    Deep learning: Deep learning has recently been at the forefront of artificial intelligence (AI) research and has truly revolutionized several fields of AI. Data Science Group members are actively involved in developing methods that can help us both better understand the limitations of existing deep learning approaches and also improve various aspects of their performance. They have worked on a wide variety of topics from this area, exploring both fundamental questions and practical applications. This includes, to mention a few of these topics, exploring and proposing novel deep network architectures, developing new ways of effectively transferring knowledge between networks, revisiting weight parameterizations for deep networks in order to improve their generalization capabilities, proposing novel self-supervised and few-shot learning methods, devising and applying learning approaches that advance the state-of-the-art for fundamental problems from the areas of computer vision and image analysis, making use of attention in the context of knowledge distillation, proposing hybrid (scattering-based) convolutional network architectures that allow for better representation learning and more interpretable features, as well as properly adapting deep neural networks such that they can be applied to arbitrary graph-structured data directly and can also handle structured-prediction tasks.


    Neural-based Speech Synthesis - Deep Learning in Speech Processing: Deep Neural Networks have taken the engineering community by storm. Data-rich areas such as image processing and speech processing have been transformed during the last years. The Data Science Group combines its expertise on speech processing and applies deep learning techniques to applications such as voice conversion, speech synthesis and speech enhancement.


    Generative Adversarial Nets - Deep Generative Learning:  Generative models based on Deep Neural Nets have shown unprecedented capabilities in sampling data from complex but unknown distributions. Researchers in the Data Science Group develop novel algorithms for training generative models, focusing on Generative Adversarial Networks (GANs). GANs have been used in data augmentation schemes to generate synthetic data that follow the same distributional characteristics as the original dataset (which may contain sensitive information or limited number of cases). The proposed methodology has been applied to identify dyslexia in children, using measurements from specialized eye trackers.


  • Spatial, Temporal & Spatio-temporal Statistics
  • High-Dimensional and Sparse Statistics
  • Uncertainty Quantification
  • Additional Topics
  • Education and Training

Data Science

RESEARCH AND DEVELOPMENT PROGRAMS

A. ONGOING PROJECTS

  • Title: NEMO-Tools: Next-generation monitoring and mapping tools to assess marine ecosystems and biodiversity
    Funding Source: Hellenic Foundation for Research and Innovation
    Duration: 2024-2026
  • Title: smartHEALTH: European Digital Innovation Hub on Precision Medicine and Innovative E-health Services 
    Funding Source: EU
    Duration: 2024-2026
  • Title: Disentangled representation learning via Mutual Information optimization with applications in speech representation learning
    Funding Source: Private sector
    Duration: 2024
  • Title: STOMA: Towards real-time, enhanced text-to-speech synthesis on the device
    Funding Source: Hellenic Foundation for Research and Innovation
    Duration: 2022-2024
  • Title: FUSING: Biophysical tools FUSed via integrative computational approaches to decode protein foldING
    Funding Source: FORTH Synergy
    Duration: 2022-2024

B. COMPLETED PROJECTS

  • Title: SCALINCS: Scaling stochastic dynamics: from microscopic interactions to macroscopic phenomena
    Funding Agency and funding scheme: Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support faculty members and researchers and the procurement of high-cost research equipment”
    Duration: 2020-2024
  • Title: Data Landscaping: Traffic and Mobility Data Sources of Official Statistics
    Funding Source: Eurostat
    Duration: 2022
  • Title: SOLAR-P: Evaluation of alternative solar panel technologies, computation of irradiance daily profiles
    Funding source: Saudi Aramco and KAUST
    Duration: 2021-2022
  • Title: Characterising population dynamics with applications in biological data
    Funding source: ESPA - Department of Development
    Duration: 2020-2021
  • Title: WNRG: Forecasting hourly wind-farm outputs based on wind-speed predictions from alternative providers
    Funding source: EREN-Hellas
    Duration: 2020
  • Title: ENRICH: Enriched communication across the lifespan
    Funding source: EU Horizon 2020, MSCA-ETN-2020
    Duration: 2017-2020

PUBLICATIONS

  • 2024

    • C Bayer, CB Hammouda, A Papapantoleon, M Samet, R Tempone (2024) Quasi-Monte Carlo for Efficient Fourier Pricing of Multi-Asset Options, arXiv preprint arXiv:2403.02832
    • K Biza, A Ntroumpogiannis, S Triantafillou, I Tsamardinos (2024) Towards Automated Causal Discovery: a case study on 5G telecommunication data, arXiv preprint arXiv:2402.14481
    • IK Buhl, J Nart, Z Papadovasilakis, S Sørensen, PB Jensen, B Ejlertsen, UH Buhl, I Tsamardinos (2024) Predicting epirubicin response in Danish patients with breast cancer, Journal of Clinical Oncology 42 (16_suppl), 539-539.
    • A Chatzimentor, A Doxa, M Butenschön, T Kristiansen, MA Peck, S Katsanevakis, AD Mazaris (2024) Diving into warming oceans: Assessing 3D climatically suitable foraging areas of loggerhead sea turtles under climate change, Journal for Nature Conservation, 126620.
    • K Ellrott, CK Wong, C Yau, MA Castro, J Lee, B Karlberg, JK Grewal, V Lagani, B Tercan, V Friedl, T Hinoue, V Uzunangelov, L Westlake, X Loinaz, I Felau, P Wang, A Kemal, SJ Caesar-Johnson, I Shmulevich, AJ Lazar, I Tsamardinos, KA Hoadley, Cancer Genome Atlas Analysis Network, GA Robertson, TA Knijnenburg, CC Benz, JM Stuart, JC Zenklusen, AD Cherniack, PW Laird (2024) Leveraging compact feature sets for TCGA-based molecular subtype classification on new samples, Cancer Research 84 (6_Supplement), 6548-6548.
    • J Fagin, G Vernardos, G Tsagkatakis, Y Pantazis, AJ Shajib, M O'Dowd (2024) Measuring the Substructure Mass Power Spectrum of 23 SLACS Strong Galaxy-Galaxy Lenses with Convolutional Neural Networks, Monthly Notices of the Royal Astronomical Society, (pdf file
    • A Fayomi, Y Pantazis, M Tsagris, ATA Wood (2024) Cauchy robust principal component analysis with applications to high-dimensional data sets, Statistics and Computing 34 (1), 1-14.
    • L Gavalakis, I Kontoyiannis (2024) Entropy and the discrete central limit theorem, Stochastic Processes and their Applications 170, 104294.
    • L Gavalakis, I Kontoyiannis, M Madiman (2024) The entropic doubling constant and robustness of Gaussian codebooks for additive-noise channels, arXiv preprint arXiv:2403.07209
    • EH Georgoulis, A Papapantoleon, C Smaragdakis (2024) A deep implicit-explicit minimizing movement method for option pricing in jump-diffusion models, arXiv preprint arXiv:2401.06740
    • F Goudarzi, A Doxa, MR Hemami, AD Mazaris (2024) Thermal vulnerability of sea turtle foraging grounds around the globe, Communications Biology 7 (1), 347.
    • O Johnson, L Gavalakis, I Kontoyiannis (2024) Relative entropy bounds for sampling with and without replacement, arXiv preprint arXiv:2404.06632
    • I Kakogeorgiou, S Gidaris, K Karantzalos, N Komodakis (2024) SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22776-22786. (pdf)
    • Y Kamarianakis, Y Pantazis, E Kalligiannaki, TD Katsaounis, K Kotsovos, I Gereige, M Abdullah, A Jamal, A Tzavaras (2024) Robust day-ahead solar forecasting with endogenous data and sliding windows, J. Renewable Sustainable Energy 16 (026103), 1-13.
    • C Liu, A Papapantoleon, A Saplaouras (2024) Convergence rates for Backward SDEs driven by Levy processes arXiv preprint arXiv:2402.01337
    • V Lungu, I Kontoyiannis (2024) Finite-sample expansions for the optimal error probability in asymmetric binary hypothesis testing, arXiv preprint arXiv:2404.0960
    • N Myrtakis, I Tsamardinos, V Christophides (2024) AutoML for Explainable Anomaly Detection (XAD), The Provenance of Elegance in Computation-Essays Dedicated to Val Tannen (pdf)
    • OTD Nguyen, I Fotopoulos, M Markaki, I Tsamardinos, V Lagani, OD Røe (2024) Improving Lung Cancer Screening Selection: The HUNT Lung Cancer Risk Model for Ever-Smokers Versus the NELSON and 2021 United States Preventive Services Task Force Criteria in the Cohort of Norway: A Population-Based Prospective Study, JTO Clinical and Research Reports 5 (4), 100660.
    • OTD Nguyen, Y Fotopoulos, TH Nøst, M Markaki, V Lagani, P Sætrom, I Tsamardinos, OD Røe (2024) Differentially expressed microRNAs in prediagnostic serum linked to lung cancer up to eight years before diagnosis in prospective, population-based cohorts: A HUNT study, Journal of Clinical Oncology 42 (16_suppl), 8041-8041.
    • OTD Nguyen, Y Fotopoulos, TH Nøst, M Markaki, V Lagani, I Tsamardinos, OD Røe (2024) Effect of novel polygenetic variant of HUNT lung cancer model on lung cancer risk assessment over clinical model among light smokers (average< 9 pack-years), Journal of Clinical Oncology 42 (16_suppl), e20066-e20066
    • I Papageorgiou, I Kontoyiannis (2024) Posterior representations for Bayesian Context Trees: Sampling, estimation and convergence, Bayesian analysis 19 (2), 501-529.
    • A Papapantoleon, J Rou (2024) A time-stepping deep gradient flow method for option pricing in (rough) diffusion models, arXiv preprint arXiv:2403.00746
    • K Paraschakis, A Castellani, G Borboudakis, I Tsamardinos (2024) Confidence Interval Estimation of Predictive Performance in the Context of AutoML, arXiv preprint arXiv:2406.08099
    • G Paterakis, S Fafalios, P Charonyktakis, V Christophides, I Tsamardinos (2024) Do we really need imputation in AutoML predictive modeling? ACM Transactions on Knowledge Discovery from Data.
    • V Perifanis, E Karypidis, N Komodakis, P Efraimidis (2024) SFTC: Machine Unlearning via Selective Fine-tuning and Targeted Confusion, European Interdisciplinary Cybersecurity Conference, 29-36. (pdf)
    • M Schmidt-Mengin, A Benichoux, S Belachew, N Komodakis, N Paragios (2024) ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11428-11438. (pdf)
    • A Theocharous, GG Gregoriou, P Sapountzis, I Kontoyiannis (2024) Temporally Causal Discovery Tests for Discrete Time Series and Neural Spike Trains, IEEE Transactions on Signal Processing
    • MA Zervou, E Doutsi, Y Pantazis, P Tsakalides (2024) Multitask Classification of Antimicrobial Peptides for Simultaneous Assessment of Antimicrobial Property and Structural Fold, ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1836-1840.
    • MA Zervou, E Doutsi, Y Pantazis, P Tsakalides (2024) De Novo Antimicrobial Peptide Design with Feedback Generative Adversarial Networks, International Journal of Molecular Sciences 25 (10), 5506 (html)

  • 2023
  • 2022
  • 2021
  • 2020
  • 2019

PEOPLE

RESEARCHERS
STUDENTS
  •       Biza Konstantina (PhD candidate)
  •       Georgoulis Elias (MSc candidate)
  •       Kofidis Andreas (MSc candidate)
  •       Litsas Anastasios (MSc candidate)
  •       Papadaki Maria-Eleni (MSc candidate)
  •       Raptakis Michail (PhD candidate)
ALUMNI

CONTACT US

For any information regarding the group please contact:

Data Science Group,
Institute of Applied and Computational Mathematics,
Foundation for Research and Technology - Hellas
Nikolaou Plastira 100, Vassilika Vouton,
GR 700 13 Heraklion, Crete
GREECE

Tel: +30 2810 391800
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it. (Mrs. Maria Papadaki)

Tel.: +30 2810 391805
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it. (Mrs. Yiota Rigopoulou)