Björn Barz, Christoph Käding, Joachim Denzler:
Information-Theoretic Active Learning for Content-Based Image Retrieval.
DAGM German Conference on Pattern Recognition (DAGM-GCPR).
Pages 650-666.
2019.
[bibtex]
[pdf]
[doi]
[code]
[supplementary]
[abstract]
We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.
Erik Rodner, Alexander Freytag, Paul Bodesheim, Björn Fröhlich, Joachim Denzler:
Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks.
International Journal of Computer Vision (IJCV).
121 (2) :
pp. 253-280.
2017.
[bibtex]
[pdf]
[web]
[doi]
[abstract]
We present new methods for fast Gaussian process (GP) inference in large-scale scenarios including exact multi-class classification with label regression, hyperparameter optimization, and uncertainty prediction. In contrast to previous approaches, we use a full Gaussian process model without sparse approximation techniques. Our methods are based on exploiting generalized histogram intersection kernels and their fast kernel multiplications. We empirically validate the suitability of our techniques in a wide range of scenarios with tens of thousands of examples. Whereas plain GP models are intractable due to both memory consumption and computation time in these settings, our results show that exact inference can indeed be done efficiently. In consequence, we enable every important piece of the Gaussian process framework - learning, inference, hyperparameter optimization, variance estimation, and online learning - to be used in realistic scenarios with more than a handful of data.
Christoph Käding, Alexander Freytag, Erik Rodner, Andrea Perino, Joachim Denzler:
Large-scale Active Learning with Approximated Expected Model Output Changes.
DAGM German Conference on Pattern Recognition (DAGM-GCPR).
Pages 179-191.
2016.
[bibtex]
[pdf]
[web]
[doi]
[code]
[supplementary]
[abstract]
Incremental learning of visual concepts is one step towards reaching human capabilities beyond closed-world assumptions. Besides recent progress, it remains one of the fundamental challenges in computer vision and machine learning. Along that path, techniques are needed which allow for actively selecting informative examples from a huge pool of unlabeled images to be annotated by application experts. Whereas a manifold of active learning techniques exists, they commonly suffer from one of two drawbacks: (i) either they do not work reliably on challenging real-world data or (ii) they are kernel-based and not scalable with the magnitudes of data current vision applications need to deal with. Therefore, we present an active learning and discovery approach which can deal with huge collections of unlabeled real-world data. Our approach is based on the expected model output change principle and overcomes previous scalability issues. We present experiments on the large-scale MS-COCO dataset and on a dataset provided by biodiversity researchers. Obtained results reveal that our technique clearly improves accuracy after just a few annotations. At the same time, it outperforms previous active learning approaches in academic and real-world scenarios.
Christoph Käding, Erik Rodner, Alexander Freytag, Joachim Denzler:
Active and Continuous Exploration with Deep Neural Networks and Expected Model Output Changes.
NIPS Workshop on Continual Learning and Deep Networks (NIPS-WS).
2016.
[bibtex]
[pdf]
[web]
[abstract]
The demands on visual recognition systems do not end with the complexity offered by current large-scale image datasets, such as ImageNet. In consequence, we need curious and continuously learning algorithms that actively acquire knowledge about semantic concepts which are present in available unlabeled data. As a step towards this goal, we show how to perform continuous active learning and exploration, where an algorithm actively selects relevant batches of unlabeled examples for annotation. These examples could either belong to already known or to yet undiscovered classes. Our algorithm is based on a new generalization of the Expected Model Output Change principle for deep architectures and is especially tailored to deep neural networks. Furthermore, we show easy-to-implement approximations that yield efficient techniques for active selection. Empirical experiments show that our method outperforms currently used heuristics.
Christoph Käding, Erik Rodner, Alexander Freytag, Joachim Denzler:
Fine-tuning Deep Neural Networks in Continuous Learning Scenarios.
ACCV Workshop on Interpretation and Visualization of Deep Neural Nets (ACCV-WS).
2016.
[bibtex]
[pdf]
[web]
[supplementary]
[abstract]
The revival of deep neural networks and the availability of ImageNet laid the foundation for recent success in highly complex recognition tasks. However, ImageNet does not cover all visual concepts of all possible application scenarios. Hence, application experts still record new data constantly and expect the data to be used upon its availability. In this paper, we follow this observation and apply the classical concept of fine-tuning deep neural networks to scenarios where data from known or completely new classes is continuously added. Besides a straightforward realization of continuous fine-tuning, we empirically analyze how computational burdens of training can be further reduced. Finally, we visualize how the networks attention maps evolve over time which allows for visually investigating what the network learned during continuous fine-tuning.
Christoph Käding, Erik Rodner, Alexander Freytag, Joachim Denzler:
Watch, Ask, Learn, and Improve: A Lifelong Learning Cycle for Visual Recognition.
European Symposium on Artificial Neural Networks (ESANN).
Pages 381-386.
2016.
[bibtex]
[pdf]
[code]
[presentation]
[abstract]
We present WALI, a prototypical system that learns object categories over time by continuously watching online videos. WALI actively asks questions to a human annotator about the visual content of observed video frames. Thereby, WALI is able to receive information about new categories and to simultaneously improve its generalization abilities. The functionality of WALI is driven by scalable active learning, efficient incremental learning, as well as state-of-the-art visual descriptors. In our experiments, we show qualitative and quantitative statistics about WALI's learning process. WALI runs continuously and regularly asks questions.
Christoph Käding, Alexander Freytag, Erik Rodner, Paul Bodesheim, Joachim Denzler:
Active Learning and Discovery of Object Categories in the Presence of Unnameable Instances.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pages 4343-4352.
2015.
[bibtex]
[pdf]
[web]
[doi]
[code]
[presentation]
[supplementary]
[abstract]
Current visual recognition algorithms are "hungry" for data but massive annotation is extremely costly. Therefore, active learning algorithms are required that reduce labeling efforts to a minimum by selecting examples that are most valuable for labeling. In active learning, all categories occurring in collected data are usually assumed to be known in advance and experts should be able to label every requested instance. But do these assumptions really hold in practice? Could you name all categories in every image? Existing algorithms completely ignore the fact that there are certain examples where an oracle can not provide an answer or which even do not belong to the current problem domain. Ideally, active learning techniques should be able to discover new classes and at the same time cope with queries an expert is not able or willing to label. To meet these observations, we present a variant of the expected model output change principle for active learning and discovery in the presence of unnameable instances. Our experiments show that in these realistic scenarios, our approach substantially outperforms previous active learning methods, which are often not even able to improve with respect to the baseline of random query selection.
Paul Bodesheim, Alexander Freytag, Erik Rodner, Joachim Denzler:
Local Novelty Detection in Multi-class Recognition Problems.
IEEE Winter Conference on Applications of Computer Vision (WACV).
Pages 813-820.
2015.
[bibtex]
[pdf]
[web]
[doi]
[supplementary]
[abstract]
In this paper, we propose using local learning for multiclass novelty detection, a framework that we call local novelty detection. Estimating the novelty of a new sample is an extremely challenging task due to the large variability of known object categories. The features used to judge on the novelty are often very specific for the object in the image and therefore we argue that individual novelty models for each test sample are important. Similar to human experts, it seems intuitive to first look for the most related images thus filtering out unrelated data. Afterwards, the system focuses on discovering similarities and differences to those images only. Therefore, we claim that it is beneficial to solely consider training images most similar to a test sample when deciding about its novelty. Following the principle of local learning, for each test sample a local novelty detection model is learned and evaluated. Our local novelty score turns out to be a valuable indicator for deciding whether the sample belongs to a known category from the training set or to a new, unseen one. With our local novelty detection approach, we achieve state-of-the-art performance in multi-class novelty detection on two popular visual object recognition datasets, Caltech-256 and Image Net. We further show that our framework: (i) can be successfully applied to unknown face detection using the Labeled-Faces-in-the-Wild dataset and (ii) outperforms recent work on attribute-based unfamiliar class detection in fine-grained recognition of bird species on the challenging CUB-200-2011 dataset.
Alexander Freytag, Erik Rodner, Joachim Denzler:
Selecting Influential Examples: Active Learning with Expected Model Output Changes.
European Conference on Computer Vision (ECCV).
Pages 562-577.
2014.
[bibtex]
[pdf]
[presentation]
[supplementary]
[abstract]
In this paper, we introduce a new general strategy for active learning. The key idea of our approach is to measure the expected change of model outputs, a concept that generalizes previous methods based on expected model change and incorporates the underlying data distribution. For each example of an unlabeled set, the expected change of model predictions is calculated and marginalized over the unknown label. This results in a score for each unlabeled example that can be used for active learning with a broad range of models and learning algorithms. In particular, we show how to derive very efficient active learning methods for Gaussian process regression, which implement this general strategy, and link them to previous methods. We analyze our algorithms and compare them to a broad range of previous active learning strategies in experiments showing that they outperform state-of-the-art on well-established benchmark datasets in the area of visual object recognition.
Alexander Freytag, Erik Rodner, Trevor Darrell, Joachim Denzler:
Exemplar-specific Patch Features for Fine-grained Recognition.
DAGM German Conference on Pattern Recognition (DAGM-GCPR).
Pages 144-156.
2014.
[bibtex]
[pdf]
[code]
[supplementary]
[abstract]
In this paper, we present a new approach for fine-grained recognition or subordinate categorization, tasks where an algorithm needs to reliably differentiate between visually similar categories, e.g. different bird species. While previous approaches aim at learning a single generic representation and models with increasing complexity, we propose an orthogonal approach that learns patch representations specifically tailored to every single test exemplar. Since we query a constant number of images similar to a given test image, we obtain very compact features and avoid large-scale training with all classes and examples. Our learned mid-level features are build on shape and color detectors estimated from discovered patches reflecting small highly discriminative structures in the queried images. We evaluate our approach for fine-grained recognition on the CUB-2011 birds dataset and show that high recognition rates can be obtained by model combination.
Björn Barz, Erik Rodner, Joachim Denzler:
ARTOS -- Adaptive Real-Time Object Detection System.
arXiv preprint arXiv:1407.2721.
2014.
[bibtex]
[pdf]
[web]
[code]
[abstract]
ARTOS is all about creating, tuning, and applying object detection models with just a few clicks. In particular, ARTOS facilitates learning of models for visual object detection by eliminating the burden of having to collect and annotate a large set of positive and negative samples manually and in addition it implements a fast learning technique to reduce the time needed for the learning step. A clean and friendly GUI guides the user through the process of model creation, adaptation of learned models to different domains using in-situ images, and object detection on both offline images and images from a video stream. A library written in C++ provides the main functionality of ARTOS with a C-style procedural interface, so that it can be easily integrated with any other project.
Daniel Göhring, Judy Hoffman, Erik Rodner, Kate Saenko, Trevor Darrell:
Interactive Adaptation of Real-Time Object Detectors.
International Conference on Robotics and Automation (ICRA).
Pages 1282-1289.
2014.
[bibtex]
[pdf]
[web]
[abstract]
In the following paper, we present a framework for quickly training 2D object detectors for robotic perception. Our method can be used by robotics practitioners to quickly (under 30 seconds per object) build a large-scale real-time perception system. In particular, we show how to create new detectors on the fly using large-scale internet image databases, thus allowing a user to choose among thousands of available categories to build a detection system suitable for the particular robotic application. Furthermore, we show how to adapt these models to the current environment with just a few in-situ images. Experiments on existing 2D benchmarks evaluate the speed, accuracy, and flexibility of our system.
Judy Hoffman, Erik Rodner, Jeff Donahue, Brian Kulis, Kate Saenko:
Asymmetric and Category Invariant Feature Transformations for Domain Adaptation.
International Journal of Computer Vision (IJCV).
109 (1-2) :
pp. 28-41.
2014.
[bibtex]
[pdf]
[web]
[doi]
[abstract]
We address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce a unified flexible model for both supervised and semi-supervised learning that allows us to learn transformations between domains. Additionally, we present two instantiations of the model, one for general feature adaptation/alignment, and one specifically designed for classification. First, we show how to extend metric learning methods for domain adaptation, allowing for learning metrics independent of the domain shift and the final classifier used. Furthermore, we go beyond classical metric learning by extending the method to asymmetric, category independent transformations. Our framework can adapt features even when the target domain does not have any labeled examples for some categories, and when the target and source features have different dimensions. Finally, we develop a joint learning framework for adaptive classifiers, which outperforms competing methods in terms of multi-class accuracy and scalability. We demonstrate the ability of our approach to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types, and codebooks. The experiments show its strong performance compared to previous approaches and its applicability to large-scale scenarios.
Sergio Guadarrama, Erik Rodner, Kate Saenko, Ning Zhang, Ryan Farrell, Jeff Donahue, Trevor Darrell:
Open-vocabulary Object Retrieval.
Robotics Science and Systems (RSS).
Pages 41, ISBN 978-0-9923747-0-9.
2014.
Awarded with an AAAI invited talk
[bibtex]
[pdf]
[web]
[abstract]
In this paper, we address the problem of retrieving objects based on open-vocabulary natural language queries: Given a phrase describing a specific object, e.g., the corn flakes box, the task is to find the best match in a set of images containing candidate objects. When naming objects, humans tend to use natural language with rich semantics, including basic-level categories, fine-grained categories, and instance-level concepts such as brand names. Existing approaches to large-scale object recognition fail in this scenario, as they expect queries that map directly to a fixed set of pre-trained visual categories, e.g. ImageNet synset tags. We address this limitation by introducing a novel object retrieval method. Given a candidate object image, we first map it to a set of words that are likely to describe it, using several learned image-to-text projections. We also propose a method for handling open-vocabularies, i.e., words not contained in the training data. We then compare the natural language query to the sets of words predicted for each candidate and select the best match. Our method can combine category- and instance-level semantics in a common representation. We present extensive experimental results on several datasets using both instance-level and category-level matching and show that our approach can accurately retrieve objects based on extremely varied open-vocabulary queries. The source code of our approach will be publicly available together with pre-trained models and could be directly used for robotics applications.
Alexander Freytag, Erik Rodner, Paul Bodesheim, Joachim Denzler:
Labeling examples that matter: Relevance-Based Active Learning with Gaussian Processes.
DAGM German Conference on Pattern Recognition (DAGM-GCPR).
Pages 282-291.
2013.
[bibtex]
[pdf]
[web]
[doi]
[code]
[supplementary]
[abstract]
Active learning is an essential tool to reduce manual annotation costs in the presence of large amounts of unsupervised data. In this paper, we introduce new active learning methods based on measuring the impact of a new example on the current model. This is done by deriving model changes of Gaussian process models in closed form. Furthermore, we study typical pitfalls in active learning and show that our methods automatically balance between the exploitation and the exploration trade-off. Experiments are performed with established benchmark datasets for visual object recognition and show that our new active learning techniques are able to outperform state-of-the-art methods.
Paul Bodesheim, Alexander Freytag, Erik Rodner, Joachim Denzler:
Approximations of Gaussian Process Uncertainties for Visual Recognition Problems.
Scandinavian Conference on Image Analysis (SCIA).
Pages 182-194.
2013.
[bibtex]
[pdf]
[web]
[doi]
[abstract]
Gaussian processes offer the advantage of calculating the classification uncertainty in terms of predictive variance associated with the classification result. This is especially useful to select informative samples in active learning and to spot samples of previously unseen classes known as novelty detection. However, the Gaussian process framework suffers from high computational complexity leading to computation times too large for practical applications. Hence, we propose an approximation of the Gaussian process predictive variance leading to rigorous speedups. The complexity of both learning and testing the classification model regarding computational time and memory demand decreases by one order with respect to the number of training samples involved. The benefits of our approximations are verified in experimental evaluations for novelty detection and active learning of visual object categories on the datasets C-Pascal of Pascal VOC 2008, Caltech-256, and ImageNet.
Paul Bodesheim, Alexander Freytag, Erik Rodner, Michael Kemmler, Joachim Denzler:
Kernel Null Space Methods for Novelty Detection.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pages 3374-3381.
2013.
[bibtex]
[pdf]
[web]
[doi]
[code]
[presentation]
[abstract]
Detecting samples from previously unknown classes is a crucial task in object recognition, especially when dealing with real-world applications where the closed-world assumption does not hold. We present how to apply a null space method for novelty detection, which maps all training samples of one class to a single point. Beside the possibility of modeling a single class, we are able to treat multiple known classes jointly and to detect novelties for a set of classes with a single model. In contrast to modeling the support of each known class individually, our approach makes use of a projection in a joint subspace where training samples of all known classes have zero intra-class variance. This subspace is called the null space of the training data. To decide about novelty of a test sample, our null space approach allows for solely relying on a distance measure instead of performing density estimation directly. Therefore, we derive a simple yet powerful method for multi-class novelty detection, an important problem not studied sufficiently so far. Our novelty detection approach is assessed in comprehensive multi-class experiments using the publicly available datasets Caltech-256 and ImageNet. The analysis reveals that our null space approach is perfectly suited for multi-class novelty detection since it outperforms all other methods.
Paul Bodesheim, Erik Rodner, Alexander Freytag, Joachim Denzler:
Divergence-Based One-Class Classification Using Gaussian Processes.
British Machine Vision Conference (BMVC).
Pages 50.1-50.11.
2012.
[bibtex]
[pdf]
[web]
[doi]
[presentation]
[abstract]
We present an information theoretic framework for one-class classification, which allows for deriving several new novelty scores. With these scores, we are able to rank samples according to their novelty and to detect outliers not belonging to a learnt data distribution. The key idea of our approach is to measure the impact of a test sample on the previously learnt model. This is carried out in a probabilistic manner using Jensen-Shannon divergence and reclassification results derived from the Gaussian process regression framework. Our method is evaluated using well-known machine learning datasets as well as large-scale image categorisation experiments showing its ability to achieve state-of-the-art performance.
Alexander Lütz, Erik Rodner, Joachim Denzler:
Efficient Multi-Class Incremental Learning Using Gaussian Processes.
Open German-Russian Workshop on Pattern Recognition and Image Understanding (OGRW).
Pages 182-185.
2011.
[bibtex]
[pdf]
[abstract]
One of the main assumptions in machine learning is that sufficient training data is available in advance and batch learning can be applied. However, because of the dynamics in a lot of applications, this assumption will break down in almost all cases over time. Therefore, classifiers have to be able to adapt themselves when new training data from existing or new classes becomes available, training data is changed or should be even removed. In this paper, we present a method allowing efficient incremental learning of a Gaussian process classifier. Experimental results show the benefits in terms of needed computation times compared to building the classifier from the scratch.