Dr. rer. nat. Sven Sickert
Address: | Computer Vision Group |
Department of Mathematics and Computer Science | |
Friedrich Schiller University of Jena | |
Ernst-Abbe-Platz 2 | |
07743 Jena | |
Germany | |
Phone: | +49 (0) 3641 9 46424 |
E-mail: | sven (dot) sickert (at) uni-jena (dot) de |
Room: | 1211 |
Links: | GoogleScholar, Mastodon, ORCID |
Curriculum Vitae
since 2019 | Research Associate | |
Computer Vision Group, Friedrich Schiller University Jena | ||
since 2019 | Lecturer & Study Program Coordinator for Computer Science studies | |
Institute for Computer Science Jena, Friedrich Schiller University Jena | ||
2019 – 2022 | Team Leader: “Learning from 3D and Unstructured Data” | |
Computer Vision Group, Friedrich Schiller University Jena | ||
2013 – 2018 | PhD Student | |
Computer Vision Group, Friedrich Schiller University Jena | ||
PhD Thesis: “Semantic Segmentation of 3D Data using Contextual Cues” | ||
2007 – 2012 | Studies in Computer Science (Diploma) | |
Friedrich Schiller University Jena | ||
Focus: Image Processing / Computer Vision | ||
Diploma Thesis: “Strategien des Aktiven Lernens” (“Strategies in Active Learning”) |
Research Interests
- 3D Point Cloud Analysis
- Representation Learning
- Semantic Segmentation
- Medical Image Understanding
Projects
Ongoing
Former
- Semantic 3D Point Cloud Analysis of Outdoor Scenes
- 4D Presentation Attack Detection
- Semantic Volume Segmentation of Bio-Medical Image Data
- Urban Scene Understanding
Reviewer Activities
Journals
- IEEE Transactions on Image Processing
- Expert Systems With Applications
- Remote Sensing
Conferences & Workshops
- Asian Conference on Computer Vision (ACCV, 2022)
- UK Conference on Medical Image Understanding and Analysis (MIUA, 2020)
- Special Session on Machine Learning in Advanced Machine Vision (AMV) at ICMLA 2019
Teaching
Winter term 2024/2025
- Grundlagen informatischer Problemlösung (Grundlagen der Programmierung) with Prof. Clemens Grelck, David Schöne and Sven Thiel
- Fortgeschrittenes Programmierpraktikum with Prof. Wolfram Amme
- Spezielle Probleme im Rechnersehen with Prof. Joachim Denzler
- Informatik für Werkstoffwissenschaftler with Paul Bodesheim
- Grundlagen der Programmierung mit Python (Teil 2) with Prof. Matthias Hagen
Summer term 2024
- Objektorientierte Programmierung mit C++ (ASQ)
- Programmieren in C++
- Spezielle Probleme im Rechnersehen with Prof. Joachim Denzler
- Objektorientierte Programmierung with Prof. Wolfram Amme, Pepe Eulzer and Maik Fröbe
- Grundlagen der Programmierung mit Python (Teil 1) with Prof. Matthias Hagen
Supervised Theses
- Maria Gogolev: “Comparing and Modifying Distributions of Latent Diffusion Models to Impose Image Properties”. Master thesis, 2024. (joint supervision with Niklas Penzel and Tim Büchner)
- Chris Gerlach: “Erkennung und Evaluierung von Kopfgesten und Mimiken in Videoaufnahmen im Kontext der Verhaltensanalyse”. Master thesis, 2023. (joint supervision with Tim Büchner)
- Fateme Shafiei: “Classification of Facial Expression using Multivariate Surface Electromyography Timeseries”. Master thesis, 2022. (joint supervision with Tim Büchner)
- Chima Nmerenu: “Learning Networks for 3D Point Cloud Data based on Synthetically Generated Samples from Mathematical Formulas”. Master thesis, 2021.
- Xu Yang: “Spherical Convolutions at Anatomical Landmarks for Facial Analysis Tasks”. Master thesis, 2021. (joint supervision with Jhonatan Contreras)
- Felix Fleisch: “3D-Momente Invarianten auf Basis gelernter Triangulierungen zur semantischen Analyse von 3D-Punktwolken”. Bachelor thesis, 2021.
- Selina Müller: “Integrating Hierarchical Knowledge for Medical Image Analysis Tasks”. Master thesis, 2019. (joint supervision with Clemens-Alexander Brust)
- David Pertzborn: “Application and Analysis of Generative Adversarial Network for Non-Supervised Anomaly Detection in Colonoscopy Imaging”. Master thesis, 2019. (joint supervision with Clemens-Alexander Brust and Christoph Theiß)
- Marie Arlt: “Improving Semantic Segmentation of Polyps in Coloscopic Data using Augmentation Techniques”. Master thesis, 2019. (joint supervision with Clemens-Alexander Brust and Christoph Theiß)
- Martin Nußbaum: “Semantische Segmentierung von sequenziellen Bilddaten mittels Rekurrenter Neuronaler Netze”. Bachelor thesis, 2017.
- Christopher Manthey: “Fahrbahnsegmentierung mittels echtzeitfähiger Convolutional Networks in eingebetteten Systemen”. Master thesis, 2016.
- Matthias Reuse: “Analyse verschiedener Demosaicing-Verfahren für Aufgaben der Fahrbahnerkennung”. Bachelor thesis, 2016. (joint supervision with Manuel Amthor)
- Christoph Runge: “Semantische Segmentierung mittels Iterative Context Forests für große Mehrklassenprobleme”. Bachelor thesis, 2015.
- Clemens-Alexander Brust: “Semantische Segmentierung mittels Convolutional Networks für die automatische Fahrbahnsegmentierung” Bachelor thesis, 2014. (joint supervision with Marcel Simon and Erik Rodner)
If you are interested in doing a Bachelor’s or Master’s thesis in the area of computer vision, check out our page on Final Theses!
Publications
2025
Tim Büchner, Sven Sickert, Gerd F. Volk, Orlando Guntinas-Lichius, Joachim Denzler:
Assessing 3D Volumetric Asymmetry in Facial Palsy Patients via Advanced Multi-view Landmarks and Radial Curves.
Machine Vision and Applications. 36 (1) : 2025.
[bibtex] [doi] [abstract]
Assessing 3D Volumetric Asymmetry in Facial Palsy Patients via Advanced Multi-view Landmarks and Radial Curves.
Machine Vision and Applications. 36 (1) : 2025.
[bibtex] [doi] [abstract]
The research on facial palsy, a unilateral palsy of the facial nerve, is a complex field with many different causes and symptoms. Even modern approaches to evaluate the facial palsy state rely mainly on stills and 2D videos of the face and rarely on dynamic 3D information. Many of these analysis and visualization methods require manual intervention, which is time-consuming and error-prone. Moreover, they often depend on alignment algorithms or Euclidean measurements and consider only static facial expressions. Volumetric changes by muscle movement are essential for facial palsy analysis but require manual extraction. We propose to extract an estimated unilateral volumetric description for dynamic expressions from 3D scans. Accurate landmark positioning is required for processing the unstructured facial scans. In our case, it is attained via a multi-view method compatible with any existing 2D predictors. We analyze prediction stability and robustness against head rotation during video sequences. Further, we investigate volume changes in static and dynamic facial expressions for 34 patients with unilateral facial palsy and visualize volumetric disparities on the face surface. In a case study, we observe a decrease in the volumetric difference between the face sides during happy expressions at the beginning (13.8 +- 10.0 mm3) and end (12.8 +- 10.3 mm3) of a ten-day biofeedback therapy. The neutral face kept a consistent volume range of 11.8-12.1 mm3. The reduced volumetric difference after therapy indicates less facial asymmetry during movement, which can be used to monitor and guide treatment decisions. Our approach minimizes human intervention, simplifying the clinical routine and interaction with 3D scans to provide a more comprehensive analysis of facial palsy.
2024
Tim Büchner, Sven Sickert, Gerd F. Volk, Christoph Anders, Joachim Denzler, Orlando Guntinas-Lichius:
Reducing the Gap Between Mimics and Muscles by Enabling Facial Feature Analysis during sEMG Recordings [Abstract].
Congress of the Confederation of European ORL-HNS. 2024.
[bibtex] [pdf] [web] [abstract]
Reducing the Gap Between Mimics and Muscles by Enabling Facial Feature Analysis during sEMG Recordings [Abstract].
Congress of the Confederation of European ORL-HNS. 2024.
[bibtex] [pdf] [web] [abstract]
Introduction: Surface electromyography (sEMG) is an effective technique for studying facial muscles. However, although it would be valuable, the simultaneous acquisition of 2D facial movement videos creates incompatibilities with analysis methodologies because the sEMG electrodes and wires obstruct part of the face. The present study overcame these limitations using machine learning mechanisms to make the sEMG electrodes disappear artificially (artificial videos with removed electrodes). Material & Methods: We recorded 36 probands (18-67 years, 17 male, 19 female) and measured their muscular activity using two sEMG schematics [1], [2], totaling 60 electrodes attached to the face [3]. Each proband mimicked the six basic emotions four times randomly, guided by an instructional video. Minimal Change CycleGANs were used to make reconstruction videos without sEMG electrodes [4], [5]. Finally, the emotions expressed by the probands were classified with ResMaskNet [6]. Results: We quantitatively compared the sEMG data and reconstructed videos with reference recordings. The artificial videos achieved a Fréchet Inception Distance [10] score of 0.50 ± 0.74, while sEMG videos scored 10.46 ± 2.10, indicating high visual quality. With electrodes attached, we yield an emotion classification accuracy of 34 ± 10% (equivalent to two-category random guessing). Our approach obtained up to 83% accuracy for the removed electrodes. Conclusions: Our techniques and studies enable simultaneous analysis of muscle activity and facial movements. We reconstruct facial regions obstructed by electrodes and wires, preserving the underlying expression. Our data-driven and label-free approach enables established methods without further modifications. Supported by DFG DE-735/15-1 and DFG GU-463/12-1
Tim Büchner, Sven Sickert, Gerd F. Volk, Joachim Denzler, Orlando Guntinas-Lichius:
An Automatic, Objective Method to Measure and Visualize Volumetric Changes in Patients with Facial Palsy during 3D Video Recordings [Abstract].
95th Annual Meeting German Society of Oto-Rhino-Laryngology, Head and Neck Surgery e. V., Bonn. 2024.
[bibtex] [web] [doi] [abstract]
An Automatic, Objective Method to Measure and Visualize Volumetric Changes in Patients with Facial Palsy during 3D Video Recordings [Abstract].
95th Annual Meeting German Society of Oto-Rhino-Laryngology, Head and Neck Surgery e. V., Bonn. 2024.
[bibtex] [web] [doi] [abstract]
Introduction: Using grading systems, the severity of facial palsy is typically classified through static 2D images. These approaches fail to capture crucial facial attributes, such as the depth of the nasolabial fold. We present a novel technique that uses 3D video recordings to overcome this limitation. Our method automatically characterizes the facial structure, calculates volumetric dispari8es between the affected and contralateral side, and includes an intuitive visualization. Material: 35 patients (mean age 51 years, min. 25, max. 72; 7 ♂, 28 ♀) with unilateral chronic synkinetic facial palsy were enrolled. We utilized the 3dMD face system (3dMD LCC, Georgia, USA) to record their facial movements while they mimicked happy facial expressions four times. Each recording lasted 6.5 seconds, with a total of 140 videos. Results: We found a difference in volume between the neutral and happy expressions: 11.7 ± 9.1 mm3 and 13.73 ± 10.0 mm3 , respectively. This suggests that there is a higher level of asymmetry during movements. Our process is fully automa8c without human intervention, highlights the impacted areas, and emphasizes the differences between the affected and contralateral side. Discussion: Our data-driven method allows healthcare professionals to track and visualize patients' volumetric changes automatically, facilitating personalized treatments. It mitigates the risk of human biases in therapeutic evaluations and effectively transitions from static 2D images to dynamic 4D assessments of facial palsy state. Supported by DFG DE-735/15-1 and DFG GU-463/12-1
Tim Büchner, Sven Sickert, Gerd F. Volk, Martin Heinrich, Joachim Denzler, Orlando Guntinas-Lichius:
Measuring and Visualizing Volumetric Changes Before and After 10-Day Biofeedback Therapy in Patients with Synkinetic Facial Palsy Using 3D Video Recordings [Abstract].
Congress of the Confederation of European ORL-HNS. 2024.
[bibtex] [pdf] [web] [abstract]
Measuring and Visualizing Volumetric Changes Before and After 10-Day Biofeedback Therapy in Patients with Synkinetic Facial Palsy Using 3D Video Recordings [Abstract].
Congress of the Confederation of European ORL-HNS. 2024.
[bibtex] [pdf] [web] [abstract]
Introduction: The severity of facial palsy is typically assessed using grading systems based on 2D image analysis [1]. Thus,the full range of facial features, especially depth information, is neglected. The present study employed 3D video-based methods to measure volume disparities during dynamic facial movements [2], [3], overcoming prior limitations. In addition, impacted areas were highlighted on the scan for an intuitive visualization. Material & Methods: 35 patients (25-72 years; 28 female) with unilateral chronic synkinetic facial palsy were recorded with the 3dMD face system (3dMD LCC, Georgia, USA) at the beginning and end of 10-day biofeedback therapy focused on more symmetric facial expressions. The patients mimicked a happy facial expression four times, each recording lasting 6.5 seconds, totaling 280 videos. We used the Curvature of Radial Curves (CORC) [2] as a dense face descriptor and followed our previous method [3] to estimate the volume changes. Results: We found a reduced volume difference between contralateral and paretic side during the happy expression at therapy beginning (13.73 ± 10.0 mm3) and end (12.79 ± 10.3 mm 3). The neutral face remained unchanged in the ranges of 11.77-12.07 mm 3. This indicated a lower asymmetry during movements after therapy and could be used as an objective measurement during training for a successful therapy. Conclusions: Our data-driven method enables tracking and visualizing volume disparities between the paretic and contralateral sides. We reduce human bias during evaluation, personalize treatment, and shift 2D image assessments of facial palsy to dynamic 4D evaluations. Supported by DFG DE-735/15-1 and DFG GU-463/12-1
2023
Felix Schneider, Sven Sickert, Phillip Brandes, Sophie Marshall, Joachim Denzler:
Hard is the Task, the Samples are Few: A German Chiasmus Dataset.
Language Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC). Pages 255-260. 2023.
[bibtex] [doi] [code] [abstract]
Hard is the Task, the Samples are Few: A German Chiasmus Dataset.
Language Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC). Pages 255-260. 2023.
[bibtex] [doi] [code] [abstract]
In this work we present a novel German language dataset for the detection of the stylistic device called chiasmus collected from German dramas. The dataset includes phrases labeled as chiasmi, antimetaboles, semantically unrelated inversions, and various edge cases. The dataset was created by collecting examples from the GerDraCor dataset. We test different approaches for chiasmus detection on the samples and report an average precision of 0.74 for the best method. Additionally, we give an overview about related approaches and the current state of the research on chiasmus detection.
Tim Büchner, Sven Sickert, Gerd F. Volk, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler:
Let’s Get the FACS Straight - Reconstructing Obstructed Facial Features.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 727-736. 2023.
[bibtex] [pdf] [web] [doi] [abstract]
Let’s Get the FACS Straight - Reconstructing Obstructed Facial Features.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 727-736. 2023.
[bibtex] [pdf] [web] [doi] [abstract]
The human face is one of the most crucial parts in interhuman communication. Even when parts of the face are hidden or obstructed the underlying facial movements can be understood. Machine learning approaches often fail in that regard due to the complexity of the facial structures. To alleviate this problem a common approach is to fine-tune a model for such a specific application. However, this is computational intensive and might have to be repeated for each desired analysis task. In this paper, we propose to reconstruct obstructed facial parts to avoid the task of repeated fine-tuning. As a result, existing facial analysis methods can be used without further changes with respect to the data. In our approach, the restoration of facial features is interpreted as a style transfer task between different recording setups. By using the CycleGAN architecture the requirement of matched pairs, which is often hard to fullfill, can be eliminated. To proof the viability of our approach, we compare our reconstructions with real unobstructed recordings. We created a novel data set in which 36 test subjects were recorded both with and without 62 surface electromyography sensors attached to their faces. In our evaluation, we feature typical facial analysis tasks, like the computation of Facial Action Units and the detection of emotions. To further assess the quality of the restoration, we also compare perceptional distances. We can show, that scores similar to the videos without obstructing sensors can be achieved.
Tim Büchner, Sven Sickert, Gerd F. Volk, Orlando Guntinas-Lichius, Joachim Denzler:
From Faces To Volumes - Measuring Volumetric Asymmetry in 3D Facial Palsy Scans.
International Symposium on Visual Computing (ISVC). Pages 121-132. 2023. Best Paper Award
[bibtex] [web] [doi] [abstract]
From Faces To Volumes - Measuring Volumetric Asymmetry in 3D Facial Palsy Scans.
International Symposium on Visual Computing (ISVC). Pages 121-132. 2023. Best Paper Award
[bibtex] [web] [doi] [abstract]
The research of facial palsy, a unilateral palsy of the facial nerve, is a complex field of study with many different causes and symptoms. Even modern approaches to evaluate the facial palsy state rely mainly on stills and 2D videos of the face and rarely on 3D information. Many of these analysis and visualization methods require manual intervention, which is time-consuming and error-prone. Moreover, existing approaches depend on alignment algorithms or Euclidean measurements and consider only static facial expressions. Volumetric changes by muscle movement are essential for facial palsy analysis but require manual extraction. Our proposed method extracts a heuristic unilateral volumetric description for dynamic expressions from 3D scans. Accurate positioning of 3D landmarks, problematic for facial palsy, is automated by adapting existing methods. Additionally, we visualize the primary areas of volumetric disparity by projecting them onto the face. Our approach substantially minimizes human intervention simplifying the clinical routine and interaction with 3D scans. The proposed pipeline can potentially more effectively analyze and monitor patient treatment progress.
Tim Büchner, Sven Sickert, Roland Graßme, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler:
Using 2D and 3D Face Representations to Generate Comprehensive Facial Electromyography Intensity Maps.
International Symposium on Visual Computing (ISVC). Pages 136-147. 2023.
[bibtex] [web] [doi] [code] [abstract]
Using 2D and 3D Face Representations to Generate Comprehensive Facial Electromyography Intensity Maps.
International Symposium on Visual Computing (ISVC). Pages 136-147. 2023.
[bibtex] [web] [doi] [code] [abstract]
Electromyography (EMG) is a method to measure muscle activity. Physicians also use EMG to study the function of facial muscles through intensity maps (IMs) to support diagnostics and research. However, many existing visualizations neglect proper anatomical structures and disregard the physical properties of EMG signals. Especially the variance of facial structures between people complicates the generalization of IMs, which is crucial for their correct interpretation. In our work, we overcome these issues by introducing a pipeline to generate anatomically correct IMs for facial muscles. An IM generation algorithm is proposed based on a template model incorporating custom surface EMG schemes and combining them with a projection method to highlight the IMs on the patient's face in 2D and 3D. We evaluate the generated and projected IMs based on their correct projection quality for six base emotions on several subjects. These visualizations deepen the understanding of muscle activity areas and indicate that a holistic view of the face could be necessary to understand facial muscle activity. Medical experts can use our approach to study the function of facial muscles and to support diagnostics and therapy.
2022
Felix Schneider, Sven Sickert, Phillip Brandes, Sophie Marshall, Joachim Denzler:
Metaphor Detection for Low Resource Languages: From Zero-Shot to Few-Shot Learning in Middle High German.
LREC Workshop on Multiword Expression (LREC-WS). Pages 75-80. 2022.
[bibtex] [web] [code] [abstract]
Metaphor Detection for Low Resource Languages: From Zero-Shot to Few-Shot Learning in Middle High German.
LREC Workshop on Multiword Expression (LREC-WS). Pages 75-80. 2022.
[bibtex] [web] [code] [abstract]
In this work, we present a novel unsupervised method for adjective-noun metaphor detection on low resource languages. We propose two new approaches: First, a way of artificially generating metaphor training examples and second, a novel way to find metaphors relying only on word embeddings. The latter enables application for low resource languages. Our method is based on a transformation of word embedding vectors into another vector space, in which the distance between the adjective word vector and the noun word vector represents the metaphoricity of the word pair. We train this method in a zero-shot pseudo-supervised manner by generating artificial metaphor examples and show that our approach can be used to generate a metaphor dataset with low annotation cost. It can then be used to finetune the system in a few-shot manner. In our experiments we show the capabilities of the method in its unsupervised and in its supervised version. Additionally, we test it against a comparable unsupervised baseline method and a supervised variation of it.
Tim Büchner, Sven Sickert, Gerd F. Volk, Orlando Guntinas-Lichius, Joachim Denzler:
Automatic Objective Severity Grading of Peripheral Facial Palsy Using 3D Radial Curves Extracted from Point Clouds.
Challenges of Trustable AI and Added-Value on Health. Pages 179-183. 2022.
[bibtex] [web] [doi] [code] [abstract]
Automatic Objective Severity Grading of Peripheral Facial Palsy Using 3D Radial Curves Extracted from Point Clouds.
Challenges of Trustable AI and Added-Value on Health. Pages 179-183. 2022.
[bibtex] [web] [doi] [code] [abstract]
Peripheral facial palsy is an illness in which a one-sided ipsilateral paralysis of the facial muscles occurs due to nerve damage. Medical experts utilize visual severity grading methods to estimate this damage. Our algorithm-based method provides an objective grading using 3D point clouds. We extract from static 3D recordings facial radial curves to measure volumetric differences between both sides of the face. We analyze five patients with chronic complete peripheral facial palsy to evaluate our method by comparing changes over several recording sessions. We show that our proposed method allows an objective assessment of facial palsy.
2021
Martin Thümmel, Sven Sickert, Joachim Denzler:
Facial Behavior Analysis using 4D Curvature Statistics for Presentation Attack Detection.
IEEE International Workshop on Biometrics and Forensics (IWBF). Pages 1-6. 2021.
[bibtex] [web] [doi] [code] [abstract]
Facial Behavior Analysis using 4D Curvature Statistics for Presentation Attack Detection.
IEEE International Workshop on Biometrics and Forensics (IWBF). Pages 1-6. 2021.
[bibtex] [web] [doi] [code] [abstract]
The human face has a high potential for biometric identification due to its many individual traits. At the same time, such identification is vulnerable to biometric copies. These presentation attacks pose a great challenge in unsupervised authentication settings. As a countermeasure, we propose a method that automatically analyzes the plausibility of facial behavior based on a sequence of 3D face scans. A compact feature representation measures facial behavior using the temporal curvature change. Finally, we train our method only on genuine faces in an anomaly detection scenario. Our method can detect presentation attacks using elastic 3D masks, bent photographs with eye holes, and monitor replay-attacks. For evaluation, we recorded a challenging database containing such cases using a high-quality 3D sensor. It features 109 4D face scans including eleven different types of presentation attacks. We achieve error rates of 11% and 6% for APCER and BPCER, respectively.
2020
Anish Raj, Oliver Mothes, Sven Sickert, Gerd F. Volk, Orlando Guntinas-Lichius, Joachim Denzler:
Automatic and Objective Facial Palsy Grading Index Prediction using Deep Feature Regression.
Annual Conference on Medical Image Understanding and Analysis (MIUA). Pages 253-266. 2020.
[bibtex] [pdf] [web] [doi] [abstract]
Automatic and Objective Facial Palsy Grading Index Prediction using Deep Feature Regression.
Annual Conference on Medical Image Understanding and Analysis (MIUA). Pages 253-266. 2020.
[bibtex] [pdf] [web] [doi] [abstract]
One of the main reasons for a half-sided facial paralysis is caused by a dysfunction of the facial nerve. Physicians have to assess such a unilateral facial palsy with the help of standardized grading scales to evaluate the treatment. However, such assessments are usually very subjective and they are prone to variance and inconsistency between physicians regarding their experience. We propose an automatic non-biased method using deep features combined with a linear regression method for facial palsy grading index prediction. With an extension of the free software tool Auto-eFace we annotated images of facial palsy patients and healthy subjects according to a common facial palsy grading scale. In our experiments, we obtained an average grading error of 11%
Jhonatan Contreras, Sven Sickert, Joachim Denzler:
Region-based Edge Convolutions with Geometric Attributes for the Semantic Segmentation of Large-scale 3D Point Clouds.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 13 (1) : pp. 2598-2609. 2020.
[bibtex] [pdf] [web] [doi] [abstract]
Region-based Edge Convolutions with Geometric Attributes for the Semantic Segmentation of Large-scale 3D Point Clouds.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 13 (1) : pp. 2598-2609. 2020.
[bibtex] [pdf] [web] [doi] [abstract]
In this paper, we present a semantic segmentation framework for large-scale 3D point clouds with high spatial resolution. For such data with huge amounts of points, the classification of each individual 3D point is an intractable task. Instead, we propose to segment the scene into meaningful regions as a first step. Afterward, we classify these segments using a combination of PointNet and geometric deep learning. This two-step approach resembles object-based image analysis. As an additional novelty, we apply surface normalization techniques and enrich features with geometric attributes. Our experiments show the potential of this approach for a variety of outdoor scene analysis tasks. In particular, we are able to reach 89.6\% overall accuracy and 64.4\% average intersection over union (IoU) in the Semantic3D benchmark. Furthermore, we achieve 66.7\% average IoU on Paris-Lille-3D. We also successfully apply our approach to the automatic semantic analysis of forestry data.
2019
Andreas Dittberner, Sven Sickert, Joachim Denzler, Orlando Guntinas-Lichius:
Intraoperative Online Image-guided Biopsie on the Basis of a Deep Learning Algorithm to the Automatic Detection of Head and Neck Carcinoma by Means of Real Time Near-Infrared ICG Fluorescence Endoscopy.
Laryngo-Rhino-Otologie. 98 (S02) : pp. 115. 2019.
[bibtex] [web] [doi]
Intraoperative Online Image-guided Biopsie on the Basis of a Deep Learning Algorithm to the Automatic Detection of Head and Neck Carcinoma by Means of Real Time Near-Infrared ICG Fluorescence Endoscopy.
Laryngo-Rhino-Otologie. 98 (S02) : pp. 115. 2019.
[bibtex] [web] [doi]
Jhonatan Contreras, Sven Sickert, Joachim Denzler:
Automatically Estimating Forestal Characteristics in 3D Point Clouds using Deep Learning.
iDiv Annual Conference. 2019. Poster
[bibtex] [web] [abstract]
Automatically Estimating Forestal Characteristics in 3D Point Clouds using Deep Learning.
iDiv Annual Conference. 2019. Poster
[bibtex] [web] [abstract]
Biodiversity changes can be monitored using georeferenced and multitempo-ral data. Those changes refer to the process of automatically identifying differ-ences in the measurements computed over time. The height and the Diameterat Breast Height of the trees can be measured at different times. The mea-surements of individual trees can be tracked over the time resulting in growthrates, tree survival, among other possibles applications. We propose a deeplearning-based framework for semantic segmentation, which can manage largepoint clouds of forest areas with high spatial resolution. Our method divides apoint cloud into geometrically homogeneous segments. Then, a global feature isobtained from each segment, applying a deep learning network called PointNet.Finally, the local information of the adjacent segments is included through anadditional sub-network which applies edge convolutions. We successfully trainand test in a data set which covers an area with multiple trees. Two addi-tional forest areas were also tested. The semantic segmentation accuracy wastested using F1-score for four semantic classes:leaves(F1 = 0.908),terrain(F1 = 0.921),trunk(F1 = 0.848) anddead wood(F1 = 0.835). Furthermore,we show how our framework can be extended to deal with forest measurementssuch as measuring the height of the trees and the DBH.
Marie Arlt, Jack Peter, Sven Sickert, Clemens-Alexander Brust, Joachim Denzler, Andreas Stallmach:
Automated Polyp Differentiation on Coloscopic Data using Semantic Segmentation with CNNs.
Endoscopy. 51 (04) : pp. 4. 2019.
[bibtex] [web] [doi] [abstract]
Automated Polyp Differentiation on Coloscopic Data using Semantic Segmentation with CNNs.
Endoscopy. 51 (04) : pp. 4. 2019.
[bibtex] [web] [doi] [abstract]
Interval carcinomas are a commonly known problem in endoscopic adenoma detection, especially when they follow negative index colonoscopy. To prevent patients from these carcinomas and support the endoscopist, we reach for a live assisted system in the future, which helps to remark polyps and increase adenoma detection rate. We present our first results of polyp recognition using a machine learning approach.
2018
Andreas Dittberner, Sven Sickert, Joachim Denzler, Orlando Guntinas-Lichius, Thomas Bitter, Sven Koscielny:
Development of an Automatic Image Analysis Method by Deep Learning Methods for the Detection of Head and Neck Cancer Based on Standard Real-Time Near-Infrared ICG Fluorescence Endoscopy Images (NIR-ICG-FE).
Laryngo-Rhino-Otologie. 97 (S02) : pp. 97. 2018.
[bibtex] [web] [doi] [abstract]
Development of an Automatic Image Analysis Method by Deep Learning Methods for the Detection of Head and Neck Cancer Based on Standard Real-Time Near-Infrared ICG Fluorescence Endoscopy Images (NIR-ICG-FE).
Laryngo-Rhino-Otologie. 97 (S02) : pp. 97. 2018.
[bibtex] [web] [doi] [abstract]
Improving the gold standard in the diagnosis of head and neck cancer using white light and invasive biopsy with digital image recognition procedures, there is a need for a development of new technologies. In the sense of an "optical biopsy", they in vivo and online should provide additional objective information for decision making for the head and neck surgeon. Artificial neural networks in combination with machine learning might be a helpful and fast approach.
Niclas Schmitt, Sven Sickert, Orlando Guntinas-Lichius, Thomas Bitter, Joachim Denzler:
Automated MRI Volumetry of the Olfactory Bulb.
Laryngo-Rhino-Otologie. 97 (S02) : pp. 36. 2018.
[bibtex] [web] [doi] [abstract]
Automated MRI Volumetry of the Olfactory Bulb.
Laryngo-Rhino-Otologie. 97 (S02) : pp. 36. 2018.
[bibtex] [web] [doi] [abstract]
The olfactory bulb (OB) as part of the olfactory pathway plays a central role in odor perception. Several studies have already established a connection between an olfactory impairment and the occurrence of neurodegenerative diseases (Parkinson's disease, Alzheimer's disease, etc.). This impairment is often detectable years before further symptoms. Moreover, it is connected to a volume loss of the OB. Therefore, in future the volume of the OB could contribute as a marker for detection and diagnosis of such diseases. Despite this great importance, there is currently no standard procedure for the volumetric analysis of the OB and above all no objective investigator-independent measurement methods.
2017
Gianluca Tramontana, Martin Jung, Christopher R. Schwalm, Kazuhito Ichii, Gustau Camps-Valls, Botond Raduly, Markus Reichstein, M. Altaf Arain, Alessandro Cescatti, Gerard Kiely, Lutz Merbold, Penelope Serrano-Ortiz, Sven Sickert, Sebastian Wolf, Dario Papale:
Predicting Carbon Dioxide and Energy Fluxes with Empirical Approaches in FLUXNET.
European Geosciences Union General Assembly (EGU): Abstract + Poster Presentation. 2017.
[bibtex] [pdf] [web]
Predicting Carbon Dioxide and Energy Fluxes with Empirical Approaches in FLUXNET.
European Geosciences Union General Assembly (EGU): Abstract + Poster Presentation. 2017.
[bibtex] [pdf] [web]
Sven Sickert, Joachim Denzler:
Semantic Segmentation of Outdoor Areas using 3D Moment Invariants and Contextual Cues.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 165-176. 2017.
[bibtex] [pdf] [doi] [abstract]
Semantic Segmentation of Outdoor Areas using 3D Moment Invariants and Contextual Cues.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 165-176. 2017.
[bibtex] [pdf] [doi] [abstract]
In this paper, we propose an approach for the semantic segmentation of a 3D point cloud using local 3D moment invariants and the integration of contextual information. Specifically, we focus on the task of analyzing forestal and urban areas which were recorded by terrestrial LiDAR scanners. We demonstrate how 3D moment invariants can be leveraged as local features and that they are on a par with established descriptors. Furthermore, we show how an iterative learning scheme can increase the overall quality by taking neighborhood relationships between classes into account. Our experiments show that the approach achieves very good results for a variety of tasks including both binary and multi-class settings.
2016
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks.
2016. Technical Report TR-FSU-INF-CV-2016-01
[bibtex] [pdf] [abstract]
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks.
2016. Technical Report TR-FSU-INF-CV-2016-01
[bibtex] [pdf] [abstract]
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp during learning and testing of fully convolutional networks for the task of semantic segmentation. We compare QuickProp empirically with gradient descent, which is the current standard method. Experiments suggest that QuickProp can not compete with standard gradient descent techniques for complex computer vision tasks like semantic segmentation.
Gianluca Tramontana, Martin Jung, Christopher R. Schwalm, Kazuhito Ichii, Gustau Camps-Valls, Botond Raduly, Markus Reichstein, M. Altaf Arain, Alessandro Cescatti, Gerard Kiely, Lutz Merbold, Penelope Serrano-Ortiz, Sven Sickert, Sebastian Wolf, and Dario Papale:
Predicting Carbon Dioxide and Energy Fluxes Across Global FLUXNET Sites with Regression Algorithms.
Biogeosciences. 13 (14) : pp. 4291-4313. 2016.
[bibtex] [web] [doi] [abstract]
Predicting Carbon Dioxide and Energy Fluxes Across Global FLUXNET Sites with Regression Algorithms.
Biogeosciences. 13 (14) : pp. 4291-4313. 2016.
[bibtex] [web] [doi] [abstract]
Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data and (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange (R2 < 0.5), ecosystem respiration (R2 > 0.6), gross primary production (R2> 0.7), latent heat (R2 > 0.7), sensible heat (R2 > 0.7), and net radiation (R2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well (R2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted (R2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). The evaluated large ensemble of ML-based models will be the basis of new global flux products.
Sven Sickert, Erik Rodner, Joachim Denzler:
Semantic Volume Segmentation with Iterative Context Integration for Bio-medical Image Stacks.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 26 (1) : pp. 197-204. 2016.
[bibtex] [pdf] [abstract]
Semantic Volume Segmentation with Iterative Context Integration for Bio-medical Image Stacks.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 26 (1) : pp. 197-204. 2016.
[bibtex] [pdf] [abstract]
Automatic recognition of biological structures like membranes or synapses is important to analyze organic processes and to understand their functional behavior. To achieve this, volumetric images taken by electron microscopy or computer tomography have to be segmented into meaningful semantic regions. We are extending iterative context forests which were developed for 2D image data to image stack segmentation. In particular, our method is able to learn high-order dependencies and import contextual information, which often can not be learned by conventional Markov random field approaches usually used for this task. Our method is tested on very different and challenging medical and biological segmentation tasks.
2015
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 510-517. 2015.
[bibtex] [pdf] [doi] [abstract]
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 510-517. 2015.
[bibtex] [pdf] [doi] [abstract]
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. Poster presentation and extended abstract
[bibtex] [pdf] [abstract]
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. Poster presentation and extended abstract
[bibtex] [pdf] [abstract]
In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade dataset. Our approach is implemented as a new CNN framework especially designed for semantic segmentation with fully-convolutional architectures.
2014
Martin Jung, Kazuhito Ichii, Gustau Camps-Valls, Dario Papale, Gianluca Tramontana, Sven Sickert, Christopher Schwalm, Markus Reichstein:
An ensemble of global high-resolution products of energy fluxes over land.
International Scientific Conference on the Global Water and Energy Cycle (GEWEX). 2014. Poster
[bibtex] [abstract]
An ensemble of global high-resolution products of energy fluxes over land.
International Scientific Conference on the Global Water and Energy Cycle (GEWEX). 2014. Poster
[bibtex] [abstract]
We present an ensemble of global high-resolution energy flux products over land derived from upscaling FLUXNET eddy covariance observations with multiple machine learning methods using an array of remote sensing data. The products cover latent heat, sensible heat, ground heat fluxes, as well as net radiation over the period 2001-2012 at 8 daily temporal and 0.0833 degree spatial resolution. To account for the energy balance closure gap of eddy covariance measurements, five different energy balance correction techniques were used that correspond to different hypothesis of the causes of the energy balance closure gap. Hence, in total six different variants of sensible and latent heat flux products are available that allow quantifying uncertainties. We evaluate our products using cross-validation at site level and against various independent observation based data streams including latent and sensible heat fluxes derived from runoff, precipitation, and net radiation data from large river basins. Our products are a valuable source to evaluate or calibrate global land surface models. In conjunction with a complementary set of products for global carbon fluxes, our products are suitable to better understand the global variability of the land water, energy, and carbon cycles especially with regard to their co-variations and interactions.
Sven Sickert, Erik Rodner, Joachim Denzler:
Semantic Volume Segmentation with Iterative Context Integration.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW). Pages 220-225. 2014.
[bibtex] [pdf] [web] [abstract]
Semantic Volume Segmentation with Iterative Context Integration.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW). Pages 220-225. 2014.
[bibtex] [pdf] [web] [abstract]
Automatic recognition of biological structures like membranes or synapses is important to analyze organic processes and to understand their functional behavior. To achieve this, volumetric images taken by electron microscopy or computed tomography have to be segmented into meaningful regions. We are extending iterative context forests which were developed for 2D image data for image stack segmentation. In particular, our method s able to learn high order dependencies and import contextual information, which often can not be learned by conventional Markov random field approaches usually used for this task. Our method is tested for very different and challenging medical and biological segmentation tasks.