Datasets

Here you can find an overview of datasets provided by or used within the research of Computer Vision Group Jena.

Overview

Chimpanzee Faces in the Wild

We provide two datasets of cropped chimpanzee faces: C-Zoo and C-Tai. For details of both datasets, please refer to the corresponding paper. In addition to the pure images, the dataset comes with annotations of identity, gender, age, and age group. All annotations have been provided by experts. More details about the statistics of data are shown in the supplementary material. We provide five splits intro train and test which have been used to produce the results in Table 1 of the GCPR’16 paper.

Croation Fish Dataset

This dataset contains 794 images of 12 different fish species collected at the Adriatic sea in Croatia. All images show fishes in real live situations. These captured fishes swimming near a bait bag within a seagrass environment. They are moving in and out of the structurally complex seagrass and are very difficult to detect even for humans. A selection of frames was annotated by an expert in fish identification (CK). The fishes were marked with a bounding box and the corresponding species name. Each image patch described by a bounding box was extracted and saved as a single dataset image

ELPephants

This elephant dataset was provided by researchers from the Elephant Listening Project (ELP) at the Cornell University Ithaca, who are conducting research on forest elephants visiting the Dzanga bai clearing in the Dzanga-Ndoki National Park in the Central African Republic. It was devised for the re-identification of elephants that have been documented before. The images have been taken over a range of about 15 years. The dataset comprises 2078 images with 276 elephant individuals.

EU-Moths Dataset

This dataset consists of 200 moth species common in Central Europe. Each of the species is represented by roughly 11 images. In total, there are 2205 images. This dataset was acquired by the Zoologisches Forschungsmuseum, and in the context of the AMMOD project, we have the permission to provide this dataset publicly. The photographer is Dr. Josef Bücker.

NID Dataset

In the context of the AMMOD project, we gathered over 27.000 images with a light-based camera trap prototype. In order to develop and evaluate insect detection methods for such a setup, we annotated 818 of the images with bounding boxes. As the result, the annotations define bounding boxes for 9095 insects.

European Flood 2013 Dataset

This dataset comprises images of major flood events and is suitable for evaluating interactive image retrieval systems, where a user searches for images that are useful to derive a certain type of information. All images have hence been annotated regarding their relevance with respect to a certain set of tasks. For some images, bounding boxes around important regions are provided as well.

Fine-grained Recognition Datasets for Biodiversity Analysis

Ecuador Dataset: The dataset of Brehm et al includes only one single family of moths (Geometridae) quantitatively collected in montane tropical rainforests in southern Ecuador, the global diversity hotspot of this taxon, with 675 observed and genetically verified species in the area. It includes many closely related and look-alike species, most of them unknown to science, and is therefore particularly challenging.

Costa Rica Dataset: The dataset of Janzen and Hallwachs, derived from long-term sampling and caterpillar rearing, includes a broad range of moth and butterfly taxa sampled in northwestern Costa Rica. We reduced the dataset to female individuals only and species with at least 5 images. The dataset is already publicly available and you can download convenient image URLs and converted metadata below.

LabelMeFacade Database

Due to the small number of images available in the eTRIMS database, we generated a similar database using LabelMe which contains a huge number of images with labeled polygons. Since this is a subset of LabelMe images, the images were originally collected by Russel et al. All images should only be used for non-commercial and research experiments. Please check with the authors of the LabelMe dataset, in case you are unsure about the respective copyrights and how they apply.