Fine-grained Recognition
Team
Dimitri Korsch, Paul Bodesheim
Motivation
In this research area, we are developing methods that are able to automatically distinguish between very similar object categories. The algorithms learn from given pictures and their annotations the locations as well as the characteristics of relevant features. Applications for this research include automatic biodiversity monitoring.
Additional datasets can be found here: Link
Publications
2022
Dimitri Korsch, Paul Bodesheim, Gunnar Brehm, Joachim Denzler:
Automated Visual Monitoring of Nocturnal Insects with Light-based Camera Traps.
CVPR Workshop on Fine-grained Visual Classification (CVPR-WS). 2022.
[bibtex] [pdf] [web] [code] [abstract]
Automated Visual Monitoring of Nocturnal Insects with Light-based Camera Traps.
CVPR Workshop on Fine-grained Visual Classification (CVPR-WS). 2022.
[bibtex] [pdf] [web] [code] [abstract]
Automatic camera-assisted monitoring of insects for abundance estimations is crucial to understand and counteract ongoing insect decline. In this paper, we present two datasets of nocturnal insects, especially moths as a subset of Lepidoptera, photographed in Central Europe. One of the datasets, the EU-Moths dataset, was captured manually by citizen scientists and contains species annotations for 200 different species and bounding box annotations for those. We used this dataset to develop and evaluate a two-stage pipeline for insect detection and moth species classification in previous work. We further introduce a prototype for an automated visual monitoring system. This prototype produced the second dataset consisting of more than 27000 images captured on 95 nights. For evaluation and bootstrapping purposes, we annotated a subset of the images with bounding boxes enframing nocturnal insects. Finally, we present first detection and classification baselines for these datasets and encourage other scientists to use this publicly available data.
2021
Dimitri Korsch, Paul Bodesheim, Joachim Denzler:
Deep Learning Pipeline for Automated Visual Moth Monitoring: Insect Localization and Species Classification.
INFORMATIK 2021, Computer Science for Biodiversity Workshop (CS4Biodiversity). Pages 443-460. 2021.
[bibtex] [pdf] [web] [doi] [code] [abstract]
Deep Learning Pipeline for Automated Visual Moth Monitoring: Insect Localization and Species Classification.
INFORMATIK 2021, Computer Science for Biodiversity Workshop (CS4Biodiversity). Pages 443-460. 2021.
[bibtex] [pdf] [web] [doi] [code] [abstract]
Biodiversity monitoring is crucial for tracking and counteracting adverse trends in population fluctuations. However, automatic recognition systems are rarely applied so far, and experts evaluate the generated data masses manually. Especially the support of deep learning methods for visual monitoring is not yet established in biodiversity research, compared to other areas like advertising or entertainment. In this paper, we present a deep learning pipeline for analyzing images captured by a moth scanner, an automated visual monitoring system of moth species developed within the AMMOD project. We first localize individuals with a moth detector and afterward determine the species of detected insects with a classifier. Our detector achieves up to 99.01% mean average precision and our classifier distinguishes 200 moth species with an accuracy of 93.13% on image cutouts depicting single insects. Combining both in our pipeline improves the accuracy for species identification in images of the moth scanner from 79.62% to 88.05%.
Dimitri Korsch, Paul Bodesheim, Joachim Denzler:
End-to-end Learning of Fisher Vector Encodings for Part Features in Fine-grained Recognition.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 142-158. 2021.
[bibtex] [pdf] [web] [doi] [code] [abstract]
End-to-end Learning of Fisher Vector Encodings for Part Features in Fine-grained Recognition.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 142-158. 2021.
[bibtex] [pdf] [web] [doi] [code] [abstract]
Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in not observable parts. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated by an online EM algorithm jointly with those of the neural network and are more precise than the estimates of previous works. Our approach improves state-of-the-art accuracies for three bird species classification datasets.
Julia Böhlke, Dimitri Korsch, Paul Bodesheim, Joachim Denzler:
Exploiting Web Images for Moth Species Classification.
Computer Science for Biodiversity Workshop (CS4Biodiversity), INFORMATIK 2021. Pages 481-498. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Exploiting Web Images for Moth Species Classification.
Computer Science for Biodiversity Workshop (CS4Biodiversity), INFORMATIK 2021. Pages 481-498. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Due to shrinking habitats, moth populations are declining rapidly. An automated moth population monitoring tool is needed to support conservationists in making informed decisions for counteracting this trend. A non-invasive tool would involve the automatic classification of images of moths, a fine-grained recognition problem. Currently, the lack of images annotated by experts is the main hindrance to such a classification model. To understand how to achieve acceptable predictive accuracies, we investigate the effect of differently sized datasets and data acquired from the Internet. We find the use of web data immensely beneficial and observe that few images from the evaluation domain are enough to mitigate the domain shift in web data. Our experiments show that counteracting the domain shift may yield a relative reduction of the error rate of over 60\%. Lastly, the effect of label noise in web data and proposed filtering techniques are analyzed and evaluated.
Julia Böhlke, Dimitri Korsch, Paul Bodesheim, Joachim Denzler:
Lightweight Filtering of Noisy Web Data: Augmenting Fine-grained Datasets with Selected Internet Images.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 466-477. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Lightweight Filtering of Noisy Web Data: Augmenting Fine-grained Datasets with Selected Internet Images.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 466-477. 2021.
[bibtex] [pdf] [web] [doi] [abstract]
Despite the availability of huge annotated benchmark datasets and the potential of transfer learning, i.e., fine-tuning a pre-trained neural network to a specific task, deep learning struggles in applications where no labeled datasets of sufficient size exist. This issue affects fine-grained recognition tasks the most since correct image data annotations are expensive and require expert knowledge. Nevertheless, the Internet offers a lot of weakly annotated images. In contrast to existing work, we suggest a new lightweight filtering strategy to exploit this source of information without supervision and minimal additional costs. Our main contributions are specific filter operations that allow the selection of downloaded images to augment a training set. We filter test duplicates to avoid a biased evaluation of the methods, and two types of label noise: cross-domain noise, i.e., images outside any class in the dataset, and cross-class noise, a form of label-swapping noise. We evaluate our suggested filter operations in a controlled environment and demonstrate our methods' effectiveness with two small annotated seed datasets for moth species recognition. While noisy web images consistently improve classification accuracies, our filtering methoeds retain a fraction of the data such that high accuracies are achieved with a significantly smaller training dataset.
2020
Marcel Simon, Erik Rodner, Trevor Darell, Joachim Denzler:
The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (3) : pp. 749-763. 2020. (Pre-print published in 2019.)
[bibtex] [pdf] [web] [doi] [abstract]
The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (3) : pp. 749-763. 2020. (Pre-print published in 2019.)
[bibtex] [pdf] [web] [doi] [abstract]
Fine-grained classification describes the automated recognition of visually similar object categories like birds species. Previous works were usually based on explicit pose normalization, i.e., the detection and description of object parts. However, recent models based on a final global average or bilinear pooling have achieved a comparable accuracy without this concept. In this paper, we analyze the advantages of these approaches over generic CNNs and explicit pose normalization approaches. We also show how they can achieve an implicit normalization of the object pose. A novel visualization technique called activation flow is introduced to investigate limitations in pose handling in traditional CNNs like AlexNet and VGG. Afterward, we present and compare the explicit pose normalization approach neural activation constellations and a generalized framework for the final global average and bilinear pooling called α-pooling. We observe that the latter often achieves a higher accuracy improving common CNN models by up to 22.9%, but lacks the interpretability of the explicit approaches. We present a visualization approach for understanding and analyzing predictions of the model to address this issue. Furthermore, we show that our approaches for fine-grained recognition are beneficial for other fields like action recognition.
2019
Dimitri Korsch, Paul Bodesheim, Joachim Denzler:
Classification-Specific Parts for Improving Fine-Grained Visual Categorization.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 62-75. 2019.
[bibtex] [pdf] [web] [doi] [code] [abstract]
Classification-Specific Parts for Improving Fine-Grained Visual Categorization.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 62-75. 2019.
[bibtex] [pdf] [web] [doi] [code] [abstract]
Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classification-specific part estimation that uses an initial prediction as well as back-propagation of feature importance via gradient computations in order to estimate relevant image regions. The subsequently detected parts are then not only selected by a-posteriori classification knowledge, but also have an intrinsic spatial extent that is determined automatically. This is in contrast to most part-based approaches and even to available ground-truth part annotations, which only provide point coordinates and no additional scale information. We show in our experiments on various widely-used fine-grained datasets the effectiveness of the mentioned part selection method in conjunction with the extracted part features.
Matthias Körschens, Joachim Denzler:
ELPephants: A Fine-Grained Dataset for Elephant Re-Identification.
ICCV Workshop on Computer Vision for Wildlife Conservation (ICCV-WS). 2019.
[bibtex] [pdf] [abstract]
ELPephants: A Fine-Grained Dataset for Elephant Re-Identification.
ICCV Workshop on Computer Vision for Wildlife Conservation (ICCV-WS). 2019.
[bibtex] [pdf] [abstract]
Despite many possible applications, machine learning and computer vision approaches are very rarely utilized in biodiversity monitoring. One reason for this might be that automatic image analysis in biodiversity research often poses a unique set of challenges, some of which are not commonly found in many popular datasets. Thus, suitable image datasets are necessary for the development of appropriate algorithms tackling these challenges. In this paper we introduce the ELPephants dataset, a re-identification dataset, which contains 276 elephant individuals in 2078 images following a long-tailed distribution. It offers many different challenges, like fine-grained differences between the individuals, inferring a new view on the elephant from only one training side, aging effects on the animals and large differences in skin color. We also present a baseline approach, which is a system using a YOLO object detector, feature extraction of ImageNet features and discrimination using a support vector machine. This system achieves a top-1 accuracy of 56% and top-10 accuracy of 80% on the ELPephants dataset.
2018
Dimitri Korsch, Joachim Denzler:
In Defense of Active Part Selection for Fine-Grained Classification.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 28 (4) : pp. 658-663. 2018.
[bibtex] [pdf] [web] [doi] [abstract]
In Defense of Active Part Selection for Fine-Grained Classification.
Pattern Recognition and Image Analysis. Advances in Mathematical Theory and Applications (PRIA). 28 (4) : pp. 658-663. 2018.
[bibtex] [pdf] [web] [doi] [abstract]
Fine-grained classification is a recognition task where subtle differences distinguish between different classes. To tackle this classification problem, part-based classification methods are mostly used. Part-based methods learn an algorithm to detect parts of the observed object and extract local part features for the detected part regions. In this paper we show that not all extracted part features are always useful for the classification. Furthermore, given a part selection algorithm that actively selects parts for the classification we estimate the upper bound for the fine-grained recognition performance. This upper bound lies way above the current state- of-the-art recognition performances which shows the need for such an active part selection method. Though we do not present such an active part selection algorithm in this work, we propose a novel method that is required by active part selection and enables sequential part- based classification. This method uses a support vector machine (SVM) ensemble and allows to classify an image based on arbitrary number of part features. Additionally, the training time of our method does not increase with the amount of possible part features. This fact allows to extend the SVM ensemble with an active part selection component that operates on a large amount of part feature proposals without suffering from increasing training time.
2017
Clemens-Alexander Brust, Tilo Burghardt, Milou Groenenberg, Christoph Käding, Hjalmar Kühl, Marie Manguette, Joachim Denzler:
Towards Automated Visual Monitoring of Individual Gorillas in the Wild.
ICCV Workshop on Visual Wildlife Monitoring (ICCV-WS). Pages 2820-2830. 2017.
[bibtex] [pdf] [doi] [abstract]
Towards Automated Visual Monitoring of Individual Gorillas in the Wild.
ICCV Workshop on Visual Wildlife Monitoring (ICCV-WS). Pages 2820-2830. 2017.
[bibtex] [pdf] [doi] [abstract]
In this paper we report on the context and evaluation of a system for an automatic interpretation of sightings of individual western lowland gorillas (Gorilla gorilla gorilla) as captured in facial field photography in the wild. This effort aligns with a growing need for effective and integrated monitoring approaches for assessing the status of biodiversity at high spatio-temporal scales. Manual field photography and the utilisation of autonomous camera traps have already transformed the way ecological surveys are conducted. In principle, many environments can now be monitored continuously, and with a higher spatio-temporal resolution than ever before. Yet, the manual effort required to process photographic data to derive relevant information delimits any large scale application of this methodology. The described system applies existing computer vision techniques including deep convolutional neural networks to cover the tasks of detection and localisation, as well as individual identification of gorillas in a practically relevant setup. We evaluate the approach on a relatively large and challenging data corpus of 12,765 field images of 147 individual gorillas with image-level labels (i.e. missing bounding boxes) photographed at Mbeli Bai at the Nouabal-Ndoki National Park, Republic of Congo. Results indicate a facial detection rate of 90.8% AP and an individual identification accuracy for ranking within the Top 5 set of 80.3%. We conclude that, whilst keeping the human in the loop is critical, this result is practically relevant as it exemplifies model transferability and has the potential to assist manual identification efforts. We argue further that there is significant need towards integrating computer vision deeper into ecological sampling methodologies and field practice to move the discipline forward and open up new research horizons.
2016
Alexander Freytag, Erik Rodner, Marcel Simon, Alexander Loos, Hjalmar Kühl, Joachim Denzler:
Chimpanzee Faces in the Wild: Log-Euclidean CNNs for Predicting Identities and Attributes of Primates.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 51-63. 2016.
[bibtex] [pdf] [web] [doi] [supplementary] [abstract]
Chimpanzee Faces in the Wild: Log-Euclidean CNNs for Predicting Identities and Attributes of Primates.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 51-63. 2016.
[bibtex] [pdf] [web] [doi] [supplementary] [abstract]
In this paper, we investigate how to predict attributes of chimpanzees such as identity, age, age group, and gender. We build on convolutional neural networks, which lead to significantly superior results compared with previous state-of-the-art on hand-crafted recognition pipelines. In addition, we show how to further increase discrimination abilities of CNN activations by the Log-Euclidean framework on top of bilinear pooling. We finally introduce two curated datasets consisting of chimpanzee faces with detailed meta-information to stimulate further research. Our results can serve as the foundation for automated large-scale animal monitoring and analysis.
Erik Rodner, Marcel Simon, Bob Fisher, Joachim Denzler:
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches.
British Machine Vision Conference (BMVC). 2016.
[bibtex] [pdf] [supplementary]
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches.
British Machine Vision Conference (BMVC). 2016.
[bibtex] [pdf] [supplementary]
2015
Alexander Freytag, Alena Schadt, Joachim Denzler:
Interactive Image Retrieval for Biodiversity Research.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 129-141. 2015.
[bibtex] [pdf] [abstract]
Interactive Image Retrieval for Biodiversity Research.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 129-141. 2015.
[bibtex] [pdf] [abstract]
On a daily basis, experts in biodiversity research are confronted with the challenging task of classifying individuals to build statistics over their distributions, their habitats, or the overall biodiversity. While the number of species is vast, experts with affordable time-budgets are rare. Image retrieval approaches could greatly assist experts: when new images are captured, a list of visually similar and previously collected individuals could be returned for further comparison. Following this observation, we start by transferring latest image retrieval techniques to biodiversity scenarios. We then propose to additionally incorporate an expert's knowledge into this process by allowing him to select must-have-regions. The obtained annotations are used to train exemplar-models for region detection. Detection scores efficiently computed with convolutions are finally fused with an initial ranking to reflect both sources of information, global and local aspects. The resulting approach received highly positive feedback from several application experts. On datasets for butterfly and bird identification, we quantitatively proof the benefit of including expert-feedback resulting in gains of accuracy up to 25% and we extensively discuss current limitations and further research directions.
Marcel Simon, Erik Rodner:
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks.
International Conference on Computer Vision (ICCV). Pages 1143-1151. 2015.
[bibtex] [pdf] [web] [abstract]
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks.
International Conference on Computer Vision (ICCV). Pages 1143-1151. 2015.
[bibtex] [pdf] [web] [abstract]
Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellations of neural activation patterns computed using convolutional neural networks. In our experiments, we outperform existing approaches for fine-grained recognition on the CUB200-2011, Oxford PETS, and Oxford Flowers dataset in case no part or bounding box annotations are available and achieve state-of-the-art performance for the Stanford Dog dataset. We also show the benefits of neural constellation models as a data augmentation technique for fine-tuning. Furthermore, our paper unites the areas of generic and fine-grained classification, since our approach is suitable for both scenarios.
Marcel Simon, Erik Rodner, Joachim Denzler:
Fine-grained Classification of Identity Document Types with Only One Example.
Machine Vision Applications (MVA). Pages 126-129. 2015.
[bibtex] [pdf] [web] [abstract]
Fine-grained Classification of Identity Document Types with Only One Example.
Machine Vision Applications (MVA). Pages 126-129. 2015.
[bibtex] [pdf] [web] [abstract]
This paper shows how to recognize types of identity documents, such as passports, using state-of-the-art visual recognition approaches. Whereas recognizing individual parts on identity documents with a standardized layout is one of the old classics in computer vision, recognizing the type of the document and therefore also the layout is a challenging problem due to the large variation of the documents. In our paper, we evaluate different techniques for this application including feature representations based on recent achievements with convolutional neural networks.
2014
Alexander Freytag, Erik Rodner, Joachim Denzler:
Birds of a Feather Flock Together - Local Learning of Mid-level Representations for Fine-grained Recognition.
ECCV Workshop on Parts and Attributes (ECCV-WS). 2014.
[bibtex] [pdf] [web] [code] [presentation]
Birds of a Feather Flock Together - Local Learning of Mid-level Representations for Fine-grained Recognition.
ECCV Workshop on Parts and Attributes (ECCV-WS). 2014.
[bibtex] [pdf] [web] [code] [presentation]
Alexander Freytag, Erik Rodner, Trevor Darrell, Joachim Denzler:
Exemplar-specific Patch Features for Fine-grained Recognition.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 144-156. 2014.
[bibtex] [pdf] [code] [supplementary] [abstract]
Exemplar-specific Patch Features for Fine-grained Recognition.
DAGM German Conference on Pattern Recognition (DAGM-GCPR). Pages 144-156. 2014.
[bibtex] [pdf] [code] [supplementary] [abstract]
In this paper, we present a new approach for fine-grained recognition or subordinate categorization, tasks where an algorithm needs to reliably differentiate between visually similar categories, e.g. different bird species. While previous approaches aim at learning a single generic representation and models with increasing complexity, we propose an orthogonal approach that learns patch representations specifically tailored to every single test exemplar. Since we query a constant number of images similar to a given test image, we obtain very compact features and avoid large-scale training with all classes and examples. Our learned mid-level features are build on shape and color detectors estimated from discovered patches reflecting small highly discriminative structures in the queried images. We evaluate our approach for fine-grained recognition on the CUB-2011 birds dataset and show that high recognition rates can be obtained by model combination.
Christoph Göring, Erik Rodner, Alexander Freytag, Joachim Denzler:
Nonparametric Part Transfer for Fine-grained Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Pages 2489-2496. 2014.
[bibtex] [pdf] [web] [code] [presentation] [abstract]
Nonparametric Part Transfer for Fine-grained Recognition.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Pages 2489-2496. 2014.
[bibtex] [pdf] [web] [code] [presentation] [abstract]
In the following paper, we present an approach for fine-grained recognition based on a new part detection method. In particular, we propose a nonparametric label transfer technique which transfers part constellations from objects with similar global shapes. The possibility for transferring part annotations to unseen images allows for coping with a high degree of pose and view variations in scenarios where traditional detection models (such as deformable part models) fail. Our approach is especially valuable for fine-grained recognition scenarios where intraclass variations are extremely high, and precisely localized features need to be extracted. Furthermore, we show the importance of carefully designed visual extraction strategies, such as combination of complementary feature types and iterative image segmentation, and the resulting impact on the recognition performance. In experiments, our simple yet powerful approach achieves 35.9% and 57.8% accuracy on the CUB-2010 and 2011 bird datasets, which is the current best performance for these benchmarks.
Marcel Simon, Erik Rodner, Joachim Denzler:
Part Detector Discovery in Deep Convolutional Neural Networks.
Asian Conference on Computer Vision (ACCV). Pages 162-177. 2014.
[bibtex] [pdf] [code] [abstract]
Part Detector Discovery in Deep Convolutional Neural Networks.
Asian Conference on Computer Vision (ACCV). Pages 162-177. 2014.
[bibtex] [pdf] [code] [abstract]
Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called part detector discovery (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes. This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training.
2013