Constellation Models Revisited

Team

Marcel Simon and Erik Rodner

Motivation

The algorithm we developed is able to automatically distinguish between different types of birds, flowers, dog breeds, and in general very similar object categories. It is based on computer vision and machine learning techniques that learn the appearance of object categories from a given set of images together with their annotations. Very recent ideas from the deep learning area allow for estimating very complex visual models and boost the recognition performance up to 82% for a dataset with 200 different bird categories. Would you be able to distinguish them?

Method

Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellations of neural activation patterns computed using convolutional neural networks. In our experiments, we outperform existing approaches for fine-grained recognition on the CUB200-2011, Oxford PETS, and Oxford Flowers dataset in case no part or bounding box annotations are available and achieve state-of-the-art performance for the Stanford Dog dataset. We also show the benefits of neural constellation models as a data augmentation technique for fine-tuning. Furthermore, our paper unites the areas of generic and fine-grained classification, since our approach is suitable for both scenarios.

Selected Results

Results on CUB200-2011:

Annotation	Annotation	Approach	Accuracy
Training	Test
None	None	Xiao et al. [1]	77.9%
None	None	No parts	71.9%
None	None	Ours (rand.)	79.4%
None	None	Ours (const.)	81.0%

Results on NA birds:

Annotation	Annotation	Approach	Accuracy
Training	Test
Parts	Parts	Horn et al. [2]	75.0%
None	None	No parts	63.9%
None	None	Ours (const.)	76.3%

Code

Easy to use Matlab code for the approach: GitHub

Fine-tuned models that were used in the paper: GoogleDrive

References

[1] T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR, 2015.

[2] G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In CVPR, pages 595-604, 2015.

Publications

2020

Marcel Simon, Erik Rodner, Trevor Darell, Joachim Denzler:
The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (3) : pp. 749-763. 2020. (Pre-print published in 2019.)
[bibtex] [pdf] [web] [doi] [abstract]

2015

Marcel Simon, Erik Rodner:
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks.
International Conference on Computer Vision (ICCV). Pages 1143-1151. 2015.
[bibtex] [pdf] [web] [abstract]

2014

Marcel Simon, Erik Rodner, Joachim Denzler:
Part Detector Discovery in Deep Convolutional Neural Networks.
Asian Conference on Computer Vision (ACCV). Pages 162-177. 2014.
[bibtex] [pdf] [code] [abstract]

Marcel Simon, Erik Rodner, Joachim Denzler:
Part Localization by Exploiting Deep Convolutional Networks.
ECCV Workshop on Parts and Attributes (ECCV-WS). 2014.
[bibtex] [pdf] [web]