Jan Blunk, M.Sc.

Address: | Computer Vision Group |
Department of Mathematics and Computer Science | |
Friedrich Schiller University of Jena | |
Ernst-Abbe-Platz 2 | |
07743 Jena | |
Germany | |
Phone: | +49 (0) 3641 9 46335 |
E-mail: | jan (dot) blunk (at) uni-jena (dot) de |
Room: | 1224 |
Links: |
Curriculum Vitae
since 2023 | Research Associate | |
Computer Vision Group, Friedrich Schiller University Jena | ||
2021 – 2023 | M.Sc. Computer Science | |
Master Thesis: “Steering Feature Usage During Neural Network Model Training” | ||
Friedrich Schiller University Jena | ||
2019 – 2021 | B.Sc. Computer Science | |
Bachelor Thesis: “Object Tracking in Wildlife Identification” | ||
Friedrich Schiller University Jena | ||
2018 – 2019 | B. Sc. Studies in Computer Science | |
Christian-Albrecht University of Kiel |
Research Interests
- Trustworthy AI
- Explainable AI (XAI)
- Knowledge Integration
Supervised Theses
- Christian Ickler: “Feature Steering via Multi-Task Learning”. Master thesis, 2024 (joint supervision with Laines Schmalwasser)
- Mattis Dietrich: “Monocular Facial Capture and Reconstruction using 3D-Morphable-Models for Facial Palsy”. Bachelor thesis, 2024 (joint supervision with Tim Büchner)
- Konstantin Roppel: “Model Feature Attribution for Single Images using Conditional Independence Tests”. Master thesis, 2024 (joint supervision with Niklas Penzel)
Publications
2025
Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, Joachim Denzler:
CausalRivers - Scaling Up Benchmarking of Causal Discovery for Real-world Time-series.
International Conference on Learning Representations (ICLR). 2025. (accepted)
[bibtex] [web] [abstract]
CausalRivers - Scaling Up Benchmarking of Causal Discovery for Real-world Time-series.
International Conference on Learning Representations (ICLR). 2025. (accepted)
[bibtex] [web] [abstract]
Causal discovery, or identifying causal relationships from observational data, is a notoriously challenging task, with numerous methods proposed to tackle it. Despite this, in-the-wild evaluation of these methods is still lacking, as works frequently rely on synthetic data evaluation and sparse real-world examples under critical theoretical assumptions. Real-world causal structures, however, are often complex, evolving over time, non-linear, and influenced by unobserved factors, making it hard to decide on a proper causal discovery strategy. To bridge this gap, we introduce CausalRivers, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. CausalRivers features an extensive dataset on river discharge that covers the eastern German territory (666 measurement stations) and the state of Bavaria (494 measurement stations). It spans the years 2019 to 2023 with a 15-minute temporal resolution. Further, we provide additional data from a flood around the Elbe River, as an event with a pronounced distributional shift. Leveraging multiple sources of information and time-series meta-data, we constructed two distinct causal ground truth graphs (Bavaria and eastern Germany). These graphs can be sampled to generate thousands of subgraphs to benchmark causal discovery across diverse and challenging settings. To demonstrate the utility of CausalRivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement. CausalRivers has the potential to facilitate robust evaluations and comparisons of causal discovery methods. Besides this primary purpose, we also expect that this dataset will be relevant for connected areas of research, such as time-series forecasting and anomaly detection. Based on this, we hope to push benchmark-driven method development that fosters advanced techniques for causal discovery, as is the case for many other areas of machine learning.
Markus Reichstein, Vitus Benson, Jan Blunk, Gustau Camps-Valls, Felix Creutzig, Carina J. Fearnley, Boran Han, Kai Kornhuber, Nasim Rahaman, Bernhard Schölkopf, José María Tárraga, Ricardo Vinuesa, Karen Dall, Joachim Denzler, Dorothea Frank, Giulia Martini, Naomi Nganga, Danielle C. Maddix, Kommy Weldemariam:
Early Warning of Complex Climate Risk with Integrated Artificial Intelligence.
Nature Communications. 16 (1) : 2025.
[bibtex] [web] [doi] [abstract]
Early Warning of Complex Climate Risk with Integrated Artificial Intelligence.
Nature Communications. 16 (1) : 2025.
[bibtex] [web] [doi] [abstract]
As climate change accelerates, human societies face growing exposure to disasters and stress, highlighting the urgent need for effective early warning systems (EWS). These systems monitor, assess, and communicate risks to support resilience and sustainable development, but challenges remain in hazard forecasting, risk communication, and decision-making. This perspective explores the transformative potential of integrated Artificial Intelligence (AI) modeling. We highlight the role of AI in developing multi-hazard EWSs that integrate Meteorological and Geospatial foundation models (FMs) for impact prediction. A user-centric approach with intuitive interfaces and community feedback is emphasized to improve crisis management. To address climate risk complexity, we advocate for causal AI models to avoid spurious predictions and stress the need for responsible AI practices. We highlight the FATES (Fairness, Accountability, Transparency, Ethics, and Sustainability) principles as essential for equitable and trustworthy AI-based Early Warning Systems for all. We further advocate for decadal EWSs, leveraging climate ensembles and generative methods to enable long-term, spatially resolved forecasts for proactive climate adaptation.
2023
Jan Blunk, Niklas Penzel, Paul Bodesheim, Joachim Denzler:
Beyond Debiasing: Actively Steering Feature Selection via Loss Regularization.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 394-408. 2023.
[bibtex] [pdf] [doi] [abstract]
Beyond Debiasing: Actively Steering Feature Selection via Loss Regularization.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 394-408. 2023.
[bibtex] [pdf] [doi] [abstract]
It is common for domain experts like physicians in medical studies to examine features for their reliability with respect to a specific domain task. When introducing machine learning, a common expectation is that machine learning models use the same features as human experts to solve a task but that is not always the case. Moreover, datasets often contain features that are known from domain knowledge to generalize badly to the real world, referred to as biases. Current debiasing methods only remove such influences. To additionally integrate the domain knowledge about well-established features into the training of a model, their relevance should be increased. We present a method that permits the manipulation of the relevance of features by actively steering the model's feature selection during the training process. That is, it allows both the discouragement of biases and encouragement of well-established features to incorporate domain knowledge about the feature reliability. We model our objectives for actively steering the feature selection process as a constrained optimization problem, which we implement via a loss regularization that is based on batch-wise feature attributions. We evaluate our approach on a novel synthetic regression dataset and a computer vision dataset. We observe that it successfully steers the features a model selects during the training process. This is a strong indicator that our method can be used to integrate domain knowledge about well-established features into a model.
2022
Paul Bodesheim, Jan Blunk, Matthias Körschens, Clemens-Alexander Brust, Christoph Käding, Joachim Denzler:
Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research. Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes.
Mammalian Biology. 102 : pp. 875-897. 2022.
[bibtex] [pdf] [web] [doi] [abstract]
Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research. Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes.
Mammalian Biology. 102 : pp. 875-897. 2022.
[bibtex] [pdf] [web] [doi] [abstract]
Animal re-identification based on image data, either recorded manually by photographers or automatically with camera traps, is an important task for ecological studies about biodiversity and conservation that can be highly automatized with algorithms from computer vision and machine learning. However, fixed identification models only trained with standard datasets before their application will quickly reach their limits, especially for long-term monitoring with changing environmental conditions, varying visual appearances of individuals over time that differ a lot from those in the training data, and new occurring individuals that have not been observed before. Hence, we believe that active learning with human-in-the-loop and continuous lifelong learning is important to tackle these challenges and to obtain high-performance recognition systems when dealing with huge amounts of additional data that become available during the application. Our general approach with image features from deep neural networks and decoupled decision models can be applied to many different mammalian species and is perfectly suited for continuous improvements of the recognition systems via lifelong learning. In our identification experiments, we consider four different taxa, namely two elephant species: African forest elephants and Asian elephants, as well as two species of great apes: gorillas and chimpanzees. Going beyond classical re-identification, our decoupled approach can also be used for predicting attributes of individuals such as gender or age using classification or regression methods. Although applicable for small datasets of individuals as well, we argue that even better recognition performance will be achieved by improving decision models gradually via lifelong learning to exploit huge datasets and continuous recordings from long-term applications. We highlight that algorithms for deploying lifelong learning in real observational studies exist and are ready for use. Hence, lifelong learning might become a valuable concept that supports practitioners when analyzing large-scale image data during long-term monitoring of mammals.