Fine-grained Recognition Datasets for Biodiversity Analysis

This webpage contains datasets and supplementary information for the following paper:

E. Rodner, M. Simon, G. Brehm, S. Pietsch, J. W. Wägele, J. Denzler. Fine-grained Recognition Datasets for Biodiversity Analysis. CVPR Workshop on Fine-grained Visual Classification (CVPR-W). 2015.

Ecuador dataset of Brehm et al.

The dataset of Brehm et al. includes only one single family of moths (Geometridae) quantitatively collected in montane tropical rainforests in southern Ecuador, the global diversity hotspot of this taxon, with 675 observed and genetically verified species in the area. It includes many closely related and look-alike species, most of them unknown to science, and is therefore particularly challenging. Since expert knowledge on these moths is very scarce, automated image analysis could substantially contribute to species-sorting by untrained persons, or to monitoring schemes in endangered habitats. The images have been taken in a controlled environment with uniform background and canonical poses, which makes it easy to focus feature extraction on the important parts of the image.

Dataset package together with labels: GitHub

Original publication:
G. Brehm, P. Strutzenberger, and K. Fiedler. Phylogenetic diversity of geometrid moths decreases with elevation in the tropical andes. Ecography, 36(11):1247-1253. 2013.

Costa Rica dataset of Janzen and Hallwachs

The dataset of Janzen and Hallwachs, derived from long-term sampling and caterpillar rearing, includes a broad range of moth and butterfly taxa sampled in north western Costa Rica. We reduced the dataset to female individuals only and species with at least 5 images. The dataset is already publicly available and you can download convenient image URLs and converted meta data below.

Dataset package together with links and labels: GitHub

Original publication:
D. H. Janzen and W. Hallwachs. Philosophy, navigation and use of a dynamic database (acg caterpillars srnp) for an inventory of the caterpillar fauna, and its food plants and parasitoids, of area de conservacion guanacaste (acg), northwestern costa rica. 2010.

Use of datasets

If you use the dataset, please cite the original papers and the workshop paper (see README files of the corresponding datasets).

Dataset bias

Every dataset is subject to a certain bias and indeed also the datasets above have a bias that should be kept in mind whan performing tests:

  • As can be seen from the overview figure of the Costa Rica dataset, some of the categories have a very characteristic background.
  • The images have different sizes and should be scaled to a fixed size for experiments.
  • Some of the images of the Ecuador dataset contain textual labels with category specification, which are mostly hidden by the butterflies, but parts of it are visible.