Dr. rer. nat. Clemens-Alexander Brust
Curriculum Vitae
- 2017-2021: Research Associate with the Computer Vision Group at Friedrich Schiller University Jena
- 2017: Master Thesis “Incremental Learning of YOLO Object Detection”
- 2015-2017: Studies of Computational and Data Science at Friedrich Schiller University Jena
- 2014: Bachelor Thesis “Convolutional Networks for Automatic Road Segmentation”
- 2010-2014: Studies of Computer Science at Friedrich Schiller University Jena
Research Interests
- Hierarchical Classification
- Lifelong Learning
- Learning with Few Examples
Publications
2022
Paul Bodesheim, Jan Blunk, Matthias Körschens, Clemens-Alexander Brust, Christoph Käding, Joachim Denzler:
Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research. Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes.
Mammalian Biology. 102 : pp. 875-897. 2022.
[bibtex] [web] [doi] [abstract]
Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research. Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes.
Mammalian Biology. 102 : pp. 875-897. 2022.
[bibtex] [web] [doi] [abstract]
Animal re-identification based on image data, either recorded manually by photographers or automatically with camera traps, is an important task for ecological studies about biodiversity and conservation that can be highly automatized with algorithms from computer vision and machine learning. However, fixed identification models only trained with standard datasets before their application will quickly reach their limits, especially for long-term monitoring with changing environmental conditions, varying visual appearances of individuals over time that differ a lot from those in the training data, and new occurring individuals that have not been observed before. Hence, we believe that active learning with human-in-the-loop and continuous lifelong learning is important to tackle these challenges and to obtain high-performance recognition systems when dealing with huge amounts of additional data that become available during the application. Our general approach with image features from deep neural networks and decoupled decision models can be applied to many different mammalian species and is perfectly suited for continuous improvements of the recognition systems via lifelong learning. In our identification experiments, we consider four different taxa, namely two elephant species: African forest elephants and Asian elephants, as well as two species of great apes: gorillas and chimpanzees. Going beyond classical re-identification, our decoupled approach can also be used for predicting attributes of individuals such as gender or age using classification or regression methods. Although applicable for small datasets of individuals as well, we argue that even better recognition performance will be achieved by improving decision models gradually via lifelong learning to exploit huge datasets and continuous recordings from long-term applications. We highlight that algorithms for deploying lifelong learning in real observational studies exist and are ready for use. Hence, lifelong learning might become a valuable concept that supports practitioners when analyzing large-scale image data during long-term monitoring of mammals.
2021
Clemens-Alexander Brust, Björn Barz, Joachim Denzler:
Self-Supervised Learning from Semantically Imprecise Data.
arXiv preprint arXiv:2104.10901. 2021.
[bibtex] [pdf] [abstract]
Self-Supervised Learning from Semantically Imprecise Data.
arXiv preprint arXiv:2104.10901. 2021.
[bibtex] [pdf] [abstract]
Learning from imprecise labels such as "animal" or "bird", but making precise predictions like "snow bunting" at test time is an important capability when expertly labeled training data is scarce. Contributions by volunteers or results of web crawling lack precision in this manner, but are still valuable. And crucially, these weakly labeled examples are available in larger quantities for lower cost than high-quality bespoke training data. CHILLAX, a recently proposed method to tackle this task, leverages a hierarchical classifier to learn from imprecise labels. However, it has two major limitations. First, it is not capable of learning from effectively unlabeled examples at the root of the hierarchy, e.g. "object". Second, an extrapolation of annotations to precise labels is only performed at test time, where confident extrapolations could be already used as training data. In this work, we extend CHILLAX with a self-supervised scheme using constrained extrapolation to generate pseudo-labels. This addresses the second concern, which in turn solves the first problem, enabling an even weaker supervision requirement than CHILLAX. We evaluate our approach empirically and show that our method allows for a consistent accuracy improvement of 0.84 to 1.19 percent points over CHILLAX and is suitable as a drop-in replacement without any negative consequences such as longer training times.
Niklas Penzel, Christian Reimers, Clemens-Alexander Brust, Joachim Denzler:
Investigating the Consistency of Uncertainty Sampling in Deep Active Learning.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 159-173. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Investigating the Consistency of Uncertainty Sampling in Deep Active Learning.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 159-173. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Uncertainty sampling is a widely used active learning strategy to select unlabeled examples for annotation. However, previous work hints at weaknesses of uncertainty sampling when combined with deep learning, where the amount of data is even more significant. To investigate these problems, we analyze the properties of the latent statistical estimators of uncertainty sampling in simple scenarios. We prove that uncertainty sampling converges towards some decision boundary. Additionally, we show that it can be inconsistent, leading to incorrect estimates of the optimal latent boundary. The inconsistency depends on the latent class distribution, more specifically on the class overlap. Further, we empirically analyze the variance of the decision boundary and find that the performance of uncertainty sampling is also connected to the class regions overlap. We argue that our findings could be the first step towards explaining the poor performance of uncertainty sampling combined with deep models.
2020
Clemens-Alexander Brust, Björn Barz, Joachim Denzler:
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge.
International Conference on Pattern Recognition (ICPR). 2020.
[bibtex] [pdf] [doi] [abstract]
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge.
International Conference on Pattern Recognition (ICPR). 2020.
[bibtex] [pdf] [doi] [abstract]
Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points.
Clemens-Alexander Brust, Christoph Käding, Joachim Denzler:
Active and Incremental Learning with Weak Supervision.
Künstliche Intelligenz (KI). 2020.
[bibtex] [pdf] [doi] [abstract]
Active and Incremental Learning with Weak Supervision.
Künstliche Intelligenz (KI). 2020.
[bibtex] [pdf] [doi] [abstract]
Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both funding and expertise. By selecting unlabeled examples that are promising in terms of model improvement and only asking for respective labels, active learning can increase the efficiency of the labeling process in terms of time and cost. In this work, we describe combinations of an incremental learning scheme and methods of active learning. These allow for continuous exploration of newly observed unlabeled data. We describe selection criteria based on model uncertainty as well as expected model output change (EMOC). An object detection task is evaluated in a continu ous exploration context on the PASCAL VOC dataset. We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application where images from camera traps are analyzed. Labeling only 32 images by accepting or rejecting proposals generated by our method yields an increase in accuracy from 25.4% to 42.6%.
2019
Clemens-Alexander Brust, Christoph Käding, Joachim Denzler:
Active Learning for Deep Object Detection.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 181-190. 2019.
[bibtex] [pdf] [doi] [abstract]
Active Learning for Deep Object Detection.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 181-190. 2019.
[bibtex] [pdf] [doi] [abstract]
The great success that deep models have achieved in the past is mainly owed to large amounts of labeled training data. However, the acquisition of labeled data for new tasks aside from existing benchmarks is both challenging and costly. Active learning can make the process of labeling new data more efficient by selecting unlabeled samples which, when labeled, are expected to improve the model the most. In this paper, we combine a novel method of active learning for object detection with an incremental learning scheme to enable continuous exploration of new unlabeled datasets. We propose a set of uncertainty-based active learning metrics suitable for most object detectors. Furthermore, we present an approach to leverage class imbalances during sample selection. All methods are evaluated systematically in a continuous exploration context on the PASCAL VOC 2012 dataset.
Clemens-Alexander Brust, Joachim Denzler:
Integrating Domain Knowledge: Using Hierarchies to Improve Deep Classifiers.
Asian Conference on Pattern Recognition (ACPR). 2019.
[bibtex] [pdf] [abstract]
Integrating Domain Knowledge: Using Hierarchies to Improve Deep Classifiers.
Asian Conference on Pattern Recognition (ACPR). 2019.
[bibtex] [pdf] [abstract]
One of the most prominent problems in machine learning in the age of deep learning is the availability of sufficiently large annotated datasets. For specific domains, \eg animal species, a long-tail distribution means that some classes are observed and annotated insufficiently. Additional labels can be prohibitively expensive, e.g. because domain experts need to be involved. However, there is more information available that is to the best of our knowledge not exploited accordingly. In this paper, we propose to make use of preexisting class hierarchies like WordNet to integrate additional domain knowledge into classification. We encode the properties of such a class hierarchy into a probabilistic model. From there, we derive a novel label encoding and a corresponding loss function. On the ImageNet and NABirds datasets our method offers a relative improvement of 10.4% and 9.6% in accuracy over the baseline respectively. After less than a third of training time, it is already able to match the baseline's fine-grained recognition performance. Both results show that our suggested method is efficient and effective.
Clemens-Alexander Brust, Joachim Denzler:
Not just a Matter of Semantics: The Relationship between Visual Similarity and Semantic Similarity.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 414-427. 2019.
[bibtex] [pdf] [doi] [abstract]
Not just a Matter of Semantics: The Relationship between Visual Similarity and Semantic Similarity.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 414-427. 2019.
[bibtex] [pdf] [doi] [abstract]
Knowledge transfer, zero-shot learning and semantic image retrieval are methods that aim at improving accuracy by utilizing semantic information, e.g., from WordNet. It is assumed that this information can augment or replace missing visual data in the form of labeled training images because semantic similarity correlates with visual similarity. This assumption may seem trivial, but is crucial for the application of such semantic methods. Any violation can cause mispredictions. Thus, it is important to examine the visual-semantic relationship for a certain target problem. In this paper, we use five different semantic and visual similarity measures each to thoroughly analyze the relationship without relying too much on any single definition. We postulate and verify three highly consequential hypotheses on the relationship. Our results show that it indeed exists and that WordNet semantic similarity carries more information about visual similarity than just the knowledge of �different classes look different�. They suggest that classification is not the ideal application for semantic methods and that wrong semantic information is much worse than none.
Marie Arlt, Jack Peter, Sven Sickert, Clemens-Alexander Brust, Joachim Denzler, Andreas Stallmach:
Automated Polyp Differentiation on Coloscopic Data using Semantic Segmentation with CNNs.
Endoscopy. 51 (04) : pp. 4. 2019.
[bibtex] [web] [doi] [abstract]
Automated Polyp Differentiation on Coloscopic Data using Semantic Segmentation with CNNs.
Endoscopy. 51 (04) : pp. 4. 2019.
[bibtex] [web] [doi] [abstract]
Interval carcinomas are a commonly known problem in endoscopic adenoma detection, especially when they follow negative index colonoscopy. To prevent patients from these carcinomas and support the endoscopist, we reach for a live assisted system in the future, which helps to remark polyps and increase adenoma detection rate. We present our first results of polyp recognition using a machine learning approach.
Stefan Hoffmann, Clemens-Alexander Brust, Maha Shadaydeh, Joachim Denzler:
Registration of High Resolution Sar and Optical Satellite Imagery Using Fully Convolutional Networks.
International Geoscience and Remote Sensing Symposium (IGARSS). Pages 5152-5155. 2019.
[bibtex] [pdf] [doi] [abstract]
Registration of High Resolution Sar and Optical Satellite Imagery Using Fully Convolutional Networks.
International Geoscience and Remote Sensing Symposium (IGARSS). Pages 5152-5155. 2019.
[bibtex] [pdf] [doi] [abstract]
Multi-modal image registration is a crucial step when fusing images which show different physical/chemical properties of an object. Depending on the compared modalities and the used registration metric, this process exhibits varying reliability. We propose a deep metric based on a fully convo-lutional neural network (FCN). It is trained from scratch on SAR-optical image pairs to predict whether certain image areas are aligned or not. Tests on the affine registration of SAR and optical images showing suburban areas verify an enormous improvement of the registration accuracy in comparison to registration metrics that are based on mutual information (MI).
2018
Christoph Theiß, Clemens-Alexander Brust, Joachim Denzler:
Dataless Black-Box Model Comparison.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 28 (4) : pp. 676-683. 2018. (also published at ICPRAI 2018)
[bibtex] [doi] [abstract]
Dataless Black-Box Model Comparison.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 28 (4) : pp. 676-683. 2018. (also published at ICPRAI 2018)
[bibtex] [doi] [abstract]
In a time where the training of new machine learning models is extremely time-consuming and resource-intensive and the sale of these models or the access to them is more popular than ever, it is important to think about ways to ensure the protection of these models against theft. In this paper, we present a method for estimating the similarity or distance between two black-box models. Our approach does not depend on the knowledge about specific training data and therefore may be used to identify copies of or stolen machine learning models. It can also be applied to detect instances of license violations regarding the use of datasets. We validate our proposed method empirically on the CIFAR-10 and MNIST datasets using convolutional neural networks, generative adversarial networks and support vector machines. We show that it can clearly distinguish between models trained on different datasets. Theoretical foundations of our work are also given.
Joachim Denzler, Christoph Käding, Clemens-Alexander Brust:
Keeping the Human in the Loop: Towards Automatic Visual Monitoring in Biodiversity Research.
International Conference on Ecological Informatics (ICEI). Pages 16. 2018.
[bibtex] [doi] [abstract]
Keeping the Human in the Loop: Towards Automatic Visual Monitoring in Biodiversity Research.
International Conference on Ecological Informatics (ICEI). Pages 16. 2018.
[bibtex] [doi] [abstract]
More and more methods in the area of biodiversity research grounds upon new opportunities arising from modern sensing devices that in principle make it possible to continuously record sensor data from the environment. However, these opportunities allow easy recording of huge amount of data, while its evaluation is difficult, if not impossible due to the enormous effort of manual inspection by the researchers. At the same time, we observe impressive results in computer vision and machine learning that are based on two major developments: firstly, the increased performance of hardware together with the advent of powerful graphical processing units applied in scientific computing. Secondly, the huge amount of, in part, annotated image data provided by today's generation of Facebook and Twitter users that are available easily over databases (e.g., Flickr) and/or search engines. However, for biodiversity applications appropriate data bases of annotated images are still missing. In this presentation we discuss already available methods from computer vision and machine learning together with upcoming challenges in automatic monitoring in biodiversity research. We argue that the key element towards success of any automatic method is the possibility to keep the human in the loop - either for correcting errors and improving the system's quality over time, for providing annotation data at moderate effort, or for acceptance and validation reasons. Thus, we summarize already existing techniques from active and life-long learning together with the enormous developments in automatic visual recognition during the past years. In addition, to allow detection of the unexpected such an automatic system must be capable to find anomalies or novel events in the data. We discuss a generic framework for automatic monitoring in biodiversity research which is the result of collaboration between computer scientists and ecologists of the past years. The key ingredients of such a framework are initial, generic classifier, for example, powerful deep learning architectures, active learning to reduce costly annotation effort by experts, fine-grained recognition to differentiate between visually very similar species, and efficient incremental update of the classifier's model over time. For most of these challenges, we present initial solutions in sample applications. The results comprise the automatic evaluation of images from camera traps, attribute estimation for species, as well as monitoring in-situ data in environmental science. Overall, we like to demonstrate the potentials and open issues in bringing together computer scientists and ecologist to open new research directions for either area.
2017
Clemens-Alexander Brust, Christoph Käding, Joachim Denzler:
You Have To Look More Than Once: Active and Continuous Exploration using YOLO.
CVPR Workshop on Continuous and Open-Set Learning (CVPR-WS). 2017. Poster presentation and extended abstract
[bibtex] [abstract]
You Have To Look More Than Once: Active and Continuous Exploration using YOLO.
CVPR Workshop on Continuous and Open-Set Learning (CVPR-WS). 2017. Poster presentation and extended abstract
[bibtex] [abstract]
Traditionally, most research in the area of object detection builds on models trained once on reliable labeled data for a predefined application. However, in many application scenarios, new data becomes available over time or the distribution underlying the problem changes itself. In this case, models are usually retrained from scratch or refined via fine-tuning or incremental learning. For most applications, acquiring new labels is the limiting factor in terms of effort or costs. Active learning aims to minimize the labeling effort by selecting only valuable samples for annotation. It is widely studied in classification tasks, where different measures of uncertainty are the most common choice for selection. We combine the deep object detector YOLO with active learning and an incremental learning scheme to build an object detection system suitable for active and continuous exploration and open-set problems by querying whole images for annotation rather than single proposals.
Clemens-Alexander Brust, Tilo Burghardt, Milou Groenenberg, Christoph Käding, Hjalmar Kühl, Marie Manguette, Joachim Denzler:
Towards Automated Visual Monitoring of Individual Gorillas in the Wild.
ICCV Workshop on Visual Wildlife Monitoring (ICCV-WS). Pages 2820-2830. 2017.
[bibtex] [pdf] [doi] [abstract]
Towards Automated Visual Monitoring of Individual Gorillas in the Wild.
ICCV Workshop on Visual Wildlife Monitoring (ICCV-WS). Pages 2820-2830. 2017.
[bibtex] [pdf] [doi] [abstract]
In this paper we report on the context and evaluation of a system for an automatic interpretation of sightings of individual western lowland gorillas (Gorilla gorilla gorilla) as captured in facial field photography in the wild. This effort aligns with a growing need for effective and integrated monitoring approaches for assessing the status of biodiversity at high spatio-temporal scales. Manual field photography and the utilisation of autonomous camera traps have already transformed the way ecological surveys are conducted. In principle, many environments can now be monitored continuously, and with a higher spatio-temporal resolution than ever before. Yet, the manual effort required to process photographic data to derive relevant information delimits any large scale application of this methodology. The described system applies existing computer vision techniques including deep convolutional neural networks to cover the tasks of detection and localisation, as well as individual identification of gorillas in a practically relevant setup. We evaluate the approach on a relatively large and challenging data corpus of 12,765 field images of 147 individual gorillas with image-level labels (i.e. missing bounding boxes) photographed at Mbeli Bai at the Nouabal-Ndoki National Park, Republic of Congo. Results indicate a facial detection rate of 90.8% AP and an individual identification accuracy for ranking within the Top 5 set of 80.3%. We conclude that, whilst keeping the human in the loop is critical, this result is practically relevant as it exemplifies model transferability and has the potential to assist manual identification efforts. We argue further that there is significant need towards integrating computer vision deeper into ecological sampling methodologies and field practice to move the discipline forward and open up new research horizons.
2016
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks.
2016. Technical Report TR-FSU-INF-CV-2016-01
[bibtex] [pdf] [abstract]
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks.
2016. Technical Report TR-FSU-INF-CV-2016-01
[bibtex] [pdf] [abstract]
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp during learning and testing of fully convolutional networks for the task of semantic segmentation. We compare QuickProp empirically with gradient descent, which is the current standard method. Experiments suggest that QuickProp can not compete with standard gradient descent techniques for complex computer vision tasks like semantic segmentation.
2015
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 510-517. 2015.
[bibtex] [pdf] [doi] [abstract]
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 510-517. 2015.
[bibtex] [pdf] [doi] [abstract]
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler:
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. Poster presentation and extended abstract
[bibtex] [pdf] [abstract]
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. Poster presentation and extended abstract
[bibtex] [pdf] [abstract]
In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade dataset. Our approach is implemented as a new CNN framework especially designed for semantic segmentation with fully-convolutional architectures.