Dr. rer. nat. Violeta Teodora Trifunov
Curriculum Vitae
2017 – 2023 | PhD Student |
Computer Vision Group, Friedrich Schiller University Jena | |
Climate Informatics Group, German Aerospace Center (DLR), Institute for Data Science, Jena | |
Research topic: “Deep graphical models and domain knowledge integration” | |
2015 – 2017 | M.Sc. in Mathematics |
Rheinische-Friedrich-Wilhelms University Bonn | |
Master Thesis: “Endomorphism Algebras of Generators-Cogenerators Associated with the Cartan Matrix” | |
2012 – 2015 | B.Sc. in Mathematics |
University of Novi Sad |
Research Interests
- Deep Learning
- Causal Graphical Models
- Knowledge Integration
- Causality
- Anomaly Detection
- Climate Informatics
Projects
Deep graphical models and domain knowledge integration
Climate data has been vastly accumulated over the past several years, making climate science one of the most data-rich domains. Despite the abundance of data to process, data science has not had a lot of impact on climate research so far, partly due to the fact that ample expert knowledge is rarely exploited. The main goal of this project is bridging the gap between deep learning and causal graphical models while using domain knowledge which could prove to be of significant importance for facilitating an understanding of the Earth system. We aim to develop a sequential version of the Causal Effect Variational Auto-Encoder (CEVAE) and apply it to time series of ecological or climate variables having suitable underlying causal graph structure. When this is accomplished, we intend to apply our method to time series anomaly detection, as well as to variables having more general causal structures.
Publications
2022
Violeta Teodora Trifunov, Maha Shadaydeh, Joachim Denzler:
Sequential Causal Effect Variational Autoencoder: Time Series Causal Link Estimation under Hidden Confounding.
arXiv preprint arXiv:2209.11497. 2022.
[bibtex] [web] [doi] [abstract]
Sequential Causal Effect Variational Autoencoder: Time Series Causal Link Estimation under Hidden Confounding.
arXiv preprint arXiv:2209.11497. 2022.
[bibtex] [web] [doi] [abstract]
Estimating causal effects from observational data in the presence of latent variables sometimes leads to spurious relationships which can be misconceived as causal. This is an important issue in many fields such as finance and climate science. We propose Sequential Causal Effect Variational Autoencoder (SCEVAE), a novel method for time series causality analysis under hidden confounding. It is based on the CEVAE framework and recurrent neural networks. The causal link's intensity of the confounded variables is calculated by using direct causal criteria based on Pearl's do-calculus. We show the efficacy of SCEVAE by applying it to synthetic datasets with both linear and nonlinear causal links. Furthermore, we apply our method to real aerosol-cloud-climate observation data. We compare our approach to a time series deconfounding method with and without substitute confounders on the synthetic data. We demonstrate that our method performs better by comparing both methods to the ground truth. In the case of real data, we use the expert knowledge of causal links and show how the use of correct proxy variables aids data reconstruction.
Violeta Teodora Trifunov, Maha Shadaydeh, Joachim Denzler:
Time Series Causal Link Estimation under Hidden Confounding using Knockoff Interventions.
NeurIPS Workshop on A Causal View on Dynamical Systems (NeurIPS-WS). 2022.
[bibtex] [pdf] [web] [abstract]
Time Series Causal Link Estimation under Hidden Confounding using Knockoff Interventions.
NeurIPS Workshop on A Causal View on Dynamical Systems (NeurIPS-WS). 2022.
[bibtex] [pdf] [web] [abstract]
Latent variables often mask cause-effect relationships in observational data which provokes spurious links that may be misinterpreted as causal. This problem sparks great interest in the fields such as climate science and economics. We propose to estimate confounded causal links of time series using Sequential Causal Effect Variational Autoencoder (SCEVAE) while applying knockoff interventions. We show the advantage of knockoff interventions by applying SCEVAE to synthetic datasets with both linear and nonlinear causal links. Moreover, we apply SCEVAE with knockoffs to real aerosol-cloud-climate observational time series data. We compare our results on synthetic data to those of a time series deconfounding method both with and without estimated confounders. We show that our method outperforms this benchmark by comparing both methods to the ground truth. For the real data analysis, we rely on expert knowledge of causal links and demonstrate how using suitable proxy variables improves the causal link estimation in the presence of hidden confounders.
2021
Violeta Teodora Trifunov, Maha Shadaydeh, Björn Barz, Joachim Denzler:
Anomaly Attribution of Multivariate Time Series using Counterfactual Reasoning.
IEEE International Conference on Machine Learning and Applications (ICMLA). Pages 166-172. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Anomaly Attribution of Multivariate Time Series using Counterfactual Reasoning.
IEEE International Conference on Machine Learning and Applications (ICMLA). Pages 166-172. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
There are numerous methods for detecting anomalies in time series, but that is only the first step to understanding them. We strive to exceed this by explaining those anomalies. Thus we develop a novel attribution scheme for multivariate time series relying on counterfactual reasoning. We aim to answer the counterfactual question of would the anomalous event have occurred if the subset of the involved variables had been more similarly distributed to the data outside of the anomalous interval. Specifically, we detect anomalous intervals using the Maximally Divergent Interval (MDI) algorithm, replace a subset of variables with their in-distribution values within the detected interval and observe if the interval has become less anomalous, by re-scoring it with MDI. We evaluate our method on multivariate temporal and spatio-temporal data and confirm the accuracy of our anomaly attribution of multiple well-understood extreme climate events such as heatwaves and hurricanes.
Violeta Teodora Trifunov, Maha Shadaydeh, Jakob Runge, Markus Reichstein, Joachim Denzler:
A Data-Driven Approach to Partitioning Net Ecosystem Exchange Using a Deep State Space Model.
IEEE Access. 9 : pp. 107873-107883. 2021.
[bibtex] [web] [doi] [abstract]
A Data-Driven Approach to Partitioning Net Ecosystem Exchange Using a Deep State Space Model.
IEEE Access. 9 : pp. 107873-107883. 2021.
[bibtex] [web] [doi] [abstract]
Describing ecosystem carbon fluxes is essential for deepening the understanding of the Earth system. However, partitioning net ecosystem exchange (NEE), i.e. the sum of ecosystem respiration (Reco) and gross primary production (GPP), into these summands is ill-posed since there can be infinitely many mathematically-valid solutions. We propose a novel data-driven approach to NEE partitioning using a deep state space model which combines the interpretability and uncertainty analysis of state space models with the ability of recurrent neural networks to learn the complex functions governing the data. We validate our proposed approach on the FLUXNET dataset. We suggest using both the past and the future of Reco’s predictors for training along with the nighttime NEE (NEEnight) to learn a dynamical model of Reco. We evaluate our nighttime Reco forecasts by comparing them to the ground truth NEEnight and obtain the best accuracy with respect to other partitioning methods. The learned nighttime Reco model is then used to forecast the daytime Reco conditioning on the future observations of different predictors, i.e., global radiation, air temperature, precipitation, vapor pressure deficit, and daytime NEE (NEEday). Subtracted from the NEEday, these estimates yield the GPP, finalizing the partitioning. Our purely data-driven daytime Reco forecasts are in line with the recent empirical partitioning studies reporting lower daytime Reco than the Reichstein method, which can be attributed to the Kok effect, i.e., the plant respiration being higher at night. We conclude that our approach is a good alternative for data-driven NEE partitioning and complements other partitioning methods.
2019
Violeta Teodora Trifunov, Maha Shadaydeh, Jakob Runge, Veronika Eyring, Markus Reichstein, Joachim Denzler:
Causal Link Estimation under Hidden Confounding in Ecological Time Series.
International Workshop on Climate Informatics (CI). 2019.
[bibtex] [pdf] [abstract]
Causal Link Estimation under Hidden Confounding in Ecological Time Series.
International Workshop on Climate Informatics (CI). 2019.
[bibtex] [pdf] [abstract]
Understanding the causes of natural phe- nomena is a subject of continuous interest in many research fields such as climate and environmental science. We address the problem of recovering nonlinear causal relationships between time series of ecological variables in the presence of a hidden confounder. We suggest a deep learning approach with domain knowledge integration based on the Causal Effect Variational Autoencoder (CEVAE) which we extend and apply to ecological time series. We compare our method’s performance to that of vector autoregressive Granger Causality (VAR-GC) to emphasize its benefits.
Violeta Teodora Trifunov, Maha Shadaydeh, Jakob Runge, Veronika Eyring, Markus Reichstein, Joachim Denzler:
Nonlinear Causal Link Estimation under Hidden Confounding with an Application to Time-Series Anomaly Detection.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 261-273. 2019.
[bibtex] [pdf] [doi] [abstract]
Nonlinear Causal Link Estimation under Hidden Confounding with an Application to Time-Series Anomaly Detection.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 261-273. 2019.
[bibtex] [pdf] [doi] [abstract]
Causality analysis represents one of the most important tasks when examining dynamical systems such as ecological time series. We propose to mitigate the problem of inferring nonlinear cause-effect de- pendencies in the presence of a hidden confounder by using deep learning with domain knowledge integration. Moreover, we suggest a time series anomaly detection approach using causal link intensity increase as an indicator of the anomaly. Our proposed method is based on the Causal Effect Variational Autoencoder (CEVAE) which we extend and apply to anomaly detection in time series. We evaluate our method on synthetic data having properties of ecological time series and compare to the vector autoregressive Granger causality (VAR-GC) baseline.
2018
Violeta Teodora Trifunov, Maha Shadaydeh, Jakob Runge, Veronika Eyring, Markus Reichstein, Joachim Denzler:
Domain knowledge integration for causality analysis of carbon-cycle variables.
American Geophysical Union Fall Meeting (AGU): Abstract + Poster Presentation. 2018.
[bibtex] [web] [abstract]
Domain knowledge integration for causality analysis of carbon-cycle variables.
American Geophysical Union Fall Meeting (AGU): Abstract + Poster Presentation. 2018.
[bibtex] [web] [abstract]
Climate data has been vastly accumulated over the past several years, making climate science one of the most data-rich domains. Despite the abundance of data to process, data science has not had a lot of impact on climate research so far, due to the fact that ample expert knowledge is rarely exploited. Furthermore, the complex nature and the continuously changing climate system both contribute to the slow data science advances in the field. This issue was shown to be amend- able through the development of data-driven methodologies that are guided by theory to constrain search, discover more meaningful patterns, and produce more accurate models [1]. Causality analysis represents one of the most important tasks in climate research, its principal difficulties being the often found non-linearities in the data, in addition to hidden causes of the observed phenomena. We propose to ameliorate the problem of determining causal-effect dependencies to a certain extent by using deep learning methods together with domain knowledge integration. The suggested method is to be based on the causal effect variational auto-encoders (CEVAE) [2] and applied to half-hourly meteorological observations and land flux eddy covariance data. This will allow for exploration of the causal-effect relationships between air temperature (Tair), global radiation (Rg) and the CO2 fluxes gross primary productivity (GPP), net ecosystem exchange (NEE) and ecosystem respiration (Reco). The aim of this study is to show whether prior domain knowledge could aid discovery of new causal relationships between certain carbon-cycle variables. In addition, the proposed method is presumed to find its application to similar problems, such as those related to CO2 concentration estimation and facilitate efforts towards better understanding of the Earth system.