|Start||End||(all times are in CEST)|
|Thursday, 07.09.||08:00||09:00||Registration and Arrival|
Reception and Introduction
Head of Innovation and IP @Carl Zeiss AG
Dr. Dennis Thom
Machine Learning Researcher @Carl Zeiss AG
Paper Session #1 (5 talks, 12min + 3min Q/A)
Paper Session #2 (3 talks, 12min + 3min Q/A)
|15:00||15:45||Prof. Dr. Daniel I. Rubenstein|
Behavioral Ecology and Conservation
Paper Session #3 (4 talks, 12min + 3min Q/A)
|17:30||18:00||Open Discussion + Summary Day 1|
|18:00||Open End||Social Gathering|
|Friday, 08.09.||09:00||09:45||FH-Prof. Dr. David Schedl|
@University of Applied Sciences Upper Austria
Paper Session #4 (5 talks, 12min + 3min Q/A)
|11:15||11:45||Open Discussion + Summary Workshop|
|11:45||12:00||Closing words + Farewell|
SWIFT – an Efficient and Effective Application of Instance Segmentation and Tracking in Wildlife Monitoringabstract paper
Instance segmentation and tracking are topics that have been little explored in the context of wildlife monitoring, but provide an essential basis for further tasks such as population estimation or behavioral analysis. In this paper, we highlight the importance of these topics and show how they can be efficiently and effectively addressed using our own multi-object tracking and segmentation (MOTS) approach, SWIFT. For this purpose, we provide an overview of our three past publications on these topics. Moreover, we evaluate SWIFT on two datasets, our self-created wildlife camera trap video dataset Wildpark Daylight containing videos of red deer and fallow deer and the Wildlife Crossings dataset containing four different animal classes. Our own dataset is one of the very few datasets in wildlife monitoring that is annotated with instance masks and tracking IDs. SWIFT significantly improves the quality of the instance masks and also multi-object tracking accuracy scores compared to using state-of-the-art instance segmentation and tracking approaches on both datasets.
|Frank Schindler (virtual)|
Leverage FAIR Machine-actionable and Crowd-sourced Analysis of Camera-trap Dataabstract
Camera traps and passive acoustic devices provide a non-invasive method to document wildlife diversity, ecology, behavior, and occurrence, and are constantly opening up new application possibilities for biodiversity research and conservation management. The use of those sensors is increasing globally, and at the same time, the amount of recorded digital photos, videos, and audio files is growing at a rapid pace, highlighting the need to implement efficient high throughput data pipelines for fast processing and analysis of critical ecological data. We present the WildLIVE portal (https://wildlive.senckenberg.de) for audiovisual data that stem from biodiversity monitoring programs data. Key objective of WildLIVE is to enable the curation of digital image, audio, and video content from camera traps based on crowd-sourcing as well as self-contained processing by machines and the subsequent mobilization of these data packed up with rich metadata, i.a. the context of the data capture such as the layout of monitoring stations. We achieve this by leveraging on the FAIR Digital Object (FDO) approach, which is building on self-describing, persistently interlinked and consistently accessible knowledge units that acts as a processable digital twin of a complex observation event. WildLIVE provides a deep learning component in form of an image annotation pipeline to locate and classify species in camera trap data which operates efficiently together with crowd-sourced curation and enrichment of data in FAIR-compliant manner.
|Claus Weiland (virtual)|
CNN Based Flank Predictor for Quadruped Animal Speciesabstract paper
The bilateral asymmetry of flanks of animals with visual body marks that uniquely identify an individual, complicates tasks like population estimations. Automatically generated additional information on the visible side of the animal would improve the accuracy for individual identification. In this study we used transfer learning on popular CNN image classification architectures to train a flank predictor that predicts the visible flank of quadruped mammalian species in images. We automatically derived the data labels from existing datasets originally labeled for animal pose estimation. We trained the models in two phases with different degrees of retraining. The developed models were evaluated in different scenarios of different unknown quadruped species in known and unknown environments. As a real-world scenario, we used a dataset of manually labeled Eurasian lynx (Lynx lynx) from camera traps in the Bavarian Forest National Park to evaluate the model. The best model, trained on a ResNet-50 backbone, achieved an accuracy of over 90 % for the unknown species lynx in a complex habitat.
Towards an Efficient Smart Camera Trap for Wildlife Monitoringabstract paper
This paper presents an AI-based smart camera-trap hardware system designed for wildlife monitoring. Our camera incorporates classification convolutional neural networks optimized for running on embedded platforms at the edge. We primarily focus on the task of blank-image filtering to alleviate subsequent manual or automatic analysis. System specifications in the proposed design enable real-time image processing and autonomous operation in the wild. Field tests conducted in Sierra de Aracena Natural Park (Spain) revealed challenges arising from environmental scene variations. To overcome these challenges, we employed transfer learning using diverse datasets and location-specific data. All in all, this study demonstrates the feasibility of building smart camera traps and emphasizes the importance of dataset diversity and adaptation to specific locations.
|Delia Velasco-Montero (virtual)|
DIOPSIS: Digital Identification of Photographically Sampled Insect Speciesabstract
Insects represent the largest percentage of all organisms in the world, but their populations are in rapid decline. To improve our understanding of trends in insect species occurrence and abundance, automated monitoring systems can provide a non-invasive, cost-effective, and standardised method. Here we present the DIOPSIS v2 system, which was developed and tested at a wide scale in the Netherlands. The system includes a digital camera with a yellow screen that attracts insects which are photographed. Through solar power and 4G connection it can run autonomously in the field for extended periods. Specialised deep learning software is developed to analyse the images for classification and biomass estimates. The system has been tested during the summer seasons of 2019-2023 at more than 70 locations in the Netherlands showcasing the ability to establish a network of automated insect monitoring stations.
|Chantal Huijbers (virtual)|
Beyond Accuracy: Confidence Score Calibration in Deep-Learning Classification Models for Camera Trap Images and Sequencesabstract
In this paper, we investigate whether deep learning models for species classification in camera trap images are well calibrated, i.e. whether predicted confidence scores can be reliably interpreted as probabilities that the predictions are true. Additionally, as camera traps are often configured to take multiple photos of the same event, we also explore the calibration of predictions at the sequence level, with different approaches for the aggregation of individuals predictions. We make the following observations: firstly, calibration and accuracy are closely intertwined and varies greatly across model architectures. Secondly, calibration is not monotonic during training and two epochs with similar accuracies can present very different calibration values. Finally, we observe that averaging the logits over the sequence before applying softmax normalization emerges as the most effective method for achieving both good calibration and accuracy at the sequence level.
Automated Wildlife Image Classification: An Active Learning Tool for Ecological Applicationsabstract paper
We propose a label-efficient learning strategy for wildlife camera trap images. This approach addresses the challenge of limited resources by combining fine-tuning of object detection and image classification models with an active learning system. By leveraging these techniques, researchers with small or medium-sized image databases can effectively harness the power of modern machine learning. Our method enhances predictive performance by optimizing hyperparameters for each dataset, and we demonstrate the value of this approach through experiments. Furthermore, the active learning pipeline significantly reduces the number of labeled training images required, especially for out-of-sample predictions. To ensure broad applicability, we provide a user-friendly software package that simplifies the implementation of our methods, even for researchers without specific programming skills. This package facilitates the adoption of our framework in ecological practice. In conclusion, our combined approach of tuning and active learning substantially improves the performance of automated image classifiers. Moreover, the ready-to-use software package enhances the community’s ability to apply our methods. Lastly, our models tailored to European wildlife data contribute to existing model bases, which are primarily trained on African and North American data.
Recognizing European Mammals and Birds in Camera Trap Images Using Convolutional Neural Networksabstract paper
A common way to study animal populations in the wild in an unobtrusive manner is using heat- or motion-activated cameras placed in natural habitats to automatically record images and/or videos. Manual analysis of the potentially large amounts of visual data obtained in this way is a time-consuming process, so automation through machine learning models trained on images and/or videos is desirable. Most visual animal recognition models are limited to mammal identification and group birds into a single class. Machine learning models for visually discriminating birds, in turn, cannot discriminate mammals and are also usually not designed for camera trap images. In this paper, we present convolutional neural network models based on the EfficientNetV2 and ConvNext architectures to recognize both mammals and bird species in camera trap images. Our ConvNextBase model achieves a mean average precision of 96.89% on our validation data set and a mean average precision of 93.88% on a test camera trap data set recorded in a forest in Hesse, Germany. This opens up a new way of automated bird monitoring besides the widely used method of bird call identification through audio recordings, which is limited to vocal bird species.
The Importance and Utility of Machine Learning for Individually Identifying Elephantsabstract paper
Analyzing animal traits at the individual level is at the core of ecology but challenging when studying wildlife populations. Our lab studies wild elephant behavior and cognition using specially-designed apparatuses, from which we can measure different aspects of behavior by observing how individuals interact with them. These apparatuses remain in the wild for long periods of time, allowing individuals to interact with them multiple times, and all behaviors are recorded using camera traps. Individual identification is necessary to study the change in or consistency in behavior within the same individual over time, and to match behavior with other factors that were observed in the same individual at a different time point or location. However, our current methods for individual identification are limited to characterizing traits (e.g., markings, ear folds, depigmentation) in photos, which takes a great deal of time and is prone to human error and bias. To solve this problem, we have partnered with the University of Jena to develop a machine-learning-based program that automatically identifies elephant individuals, which has shown good performance on an excerpt of our collected data. We are currently in the process of configuring and testing the program for its utility in our identification process. Furthermore, we are improving the identification abilities of the model by annotating new camera trap videos of individuals that we have recently identified to the training dataset (N > 250 elephants in our study population). Although an initial labeling effort for creating a training dataset is required, we think that the program is promising. It has the potential to save time and reduce human error in elephant identification, which is critical for elephant behavior, cognition, and conservation research.
|Sydney Hope (virtual)|
Adapting the Re-ID Challenge for Static Sensorsabstract paper
The Grévy’s zebra, an endangered species native to Kenya and southern Ethiopia, has been the target of sustained conservation efforts in recent years. Accurately monitoring Grévy’s zebra populations is essential for ecologists to evaluate the ongoing conservation initiative. Recently, in both 2016 and 2018, a full census of the Grévy’s zebra population has been enabled by the Great Grévy’s Rally (GGR), a community science event that combines teams of volunteers to capture data with computer vision algorithms that help experts match images to known individuals in the population. In this work, we explore complementary, scalable, cost-effective, and long-term Grévy’s population monitoring using a deployed network of camera traps at the Mpala Research Centre in Laikipia County, Kenya. Unlike the human-captured images collected by large teams of volunteers at GGR events, camera trap images are characterized by poorer quality, high rates of occlusion, and high spatio-temporal similarity within image bursts. We propose an image filtering pipeline incorporating animal detection, species identification, viewpoint estimation, quality evaluation, and temporal subsampling to compensate for these factors and obtain individual crops from camera trap images of suitable quality for re-ID. We then employ the Local Clusterings and their Alternatives (LCA) algorithm, a hybrid computer vision & graph clustering method for animal re-ID, on the resulting high-quality crops. Our method efficiently processed 8.9M unlabeled camera trap images from 70 camera traps over two years into 685 encounters of 173 unique individuals, requiring only 331 contrastive same-vs-different-individual decisions from a human reviewer.
Combining Feature Aggregation and Geometric Similarity for Re-identification of Patterned Animalsabstract paper
Image-based re-identification of animal individuals allows gathering of information such as migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analyzing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, we address the re-identification by combining two types of pattern similarity metrics: 1) pattern appearance similarity obtained by pattern feature aggregation and 2) geometric pattern similarity obtained by analyzing the geometric consistency of pattern similarities. The proposed combination allows to efficiently utilize both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, we demonstrate that the method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks.
GorillaVision – Open-set Re-identification of Wild Gorillasabstract paper
This paper presents GorillaVision, an open-set re-identification system for gorillas in the wild. Open-set re-identification is crucial to identify and track individual gorillas it has not previously encountered, thereby enhancing our understanding of gorilla behavior and population dynamics in dynamically changing wild environments. The system uses a two-stage approach in which gorilla faces are detected with a YOLOv7 detector and are subsequently classified with our model. The classification model is based on the pre-trained VisionTransformer, which is fine-tuned with Triplet Loss to compute embeddings of gorilla faces. As in many face-identification tasks, embeddings are used to provide a similarity measure between the individual gorillas. Classification is then performed on these embeddings with a k-nearest neighbors algorithm. We evaluate two datasets and show that our approach slightly outperforms the state-of-the-art YOLO detector for a closed-set scenario. In an open-set scenario, our model can deliver high-quality results with an accuracy of 60 to 90%, depending on the dataset’s quality and the number of individuals. Our code is accessible on https://github.com/Lasklu/gorillavision.
Towards a Multispectral Airborne Light Field Dataset of Forest Animalsabstract paper
Effective monitoring is crucial for conservation efforts, especially in forests, which cover a significant portion of the Earth’s surface and are home to diverse ecosystems. Monitoring terrestrial animals often relies on indirect evidence or localized methods, such as camera traps, which provide limited data. Aerial methods, including drones and satellites, are increasingly used but face challenges in dense forest areas. Despite the existence of multiple public airborne wildlife datasets, the ecosystem forest is not addressed so far. For this reason, this work introduces a novel multispectral airborne dataset of forest animals, including spatial information. Like this, the dataset shall act as the foundation for the development of an automated wildlife detection process in forests using modern technologies such as airborne light-field sampling. The proposed dataset will consist of geo-referenced RGB and thermal video data from multiple drone flights over forests, wild animal gates, but also in animal parks with near-natural structured enclosures. So far, 1.62 TB of data (37.53 h footage) have been recorded between April 2022 and June 2023. The dataset mainly contains videos of species native to Austria such as red deer, chamois, roe deer and wild boar. Both the data recording and the labelling are still ongoing.
Comparison Between Transformers and Convolutional Models for Fine-grained Classification of Insectsabstract paper
Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. Accurate identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. Today, we have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the state of the art, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? In this paper, we focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT 14, a model fully transformer-base, EfficientNet v2, a model fully convolutional-base, and ViTAEv2, a hybrid model. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
Digital Collectomics – A New Approach Looking Into the Past to Understand the Present and Predict the Future of Biodiversityabstract
Global biodiversity is changing at unprecedented rates in the Anthropocene. Whereas current biodiversity patterns can be observed directly, information from the past is far less easily retrieved, yet highly needed to predict future developments. For plants, herbaria offer a glimpse into the past. Using these collections, we can on the one hand, evaluate the plant specimen itself and determine attributes like species identity, morphological and phenological traits and even biotic interactions; on the other hand, we gain useful information via a specimen’s label on the date of collection, its location as well as the surrounding biotic and abiotic environment. Current methodological developments in computer vision enable us to extract this information in an automated way and to integrate data from other sources for a much more comprehensive analysis than before. As millions of specimens are already digitized, we can, for example, determine characteristics of species and link them via distribution records to climate change scenarios, which allows us to better predict a given plant’s threat level and to develop scenarios on consequences of biodiversity change for ecosystem functioning. This contribution reviews existing datasets from herbaria and describes potential avenues to unravel, understand, and cope with Anthropocene biodiversity change.
|Solveig Franziska Bucher|
Comparison of Object Detection Algorithms for Livestock Monitoring of Sheep in UAV imagesabstract paper
This paper presents the EU funded project SPADE, a European initiative that aims to create an Intelligent Ecosystem utilizing unmanned aerial vehicles (UAVs) for delivering sustainable digital services to various end users in sectors like agriculture, forestry, and livestock. The project’s main goal is to cater to multiple purposes and benefit a wide range of stakeholders. In this paper we specifically concentrate on the livestock use-case and explore how state-of-the-art computer vision algorithms for object detection, tracking, and landscape classification, deployed on edge devices in drones, can offer researchers, conservationists, and farmers a non-intrusive, cost-effective, and efficient method for monitoring livestock increasing animal welfare, and optimize livestock management. We present initial findings by comparing the performance of different state-of-the-art object detectors on publicly available UAV images of sheep. The key performance metrics used are average precision, mean average precision and mean average recall. These findings should enable a better pre-selection of potential object detectors for the presented edge device use case.
Diving with Penguins: Detecting Penguins and their Prey in Animal-borne Underwater Videos via Deep Learningabstract paper
African penguins (Spheniscus demersus) are an endangered species. Little is known regarding their underwater hunting strategies and associated predation success rates, yet this is essential for guiding conservation. Modern bio-logging technology has the potential to provide valuable insights, but manually analysing large amounts of data from animal-borne video recorders (AVRs) is time-consuming. In this paper, we publish a animal-borne underwater video dataset of penguins and introduce a ready-to-deploy deep learning system capable of robustly detecting penguins (mAP50@98.0%) and also instances of fish (mAP50@73.3%). We note that the detectors benefit explicitly from air-bubble learning to improve accuracy. Extending this detector towards a dual-stream behaviour recognition network, we also provide first results for identifying predation behaviour in penguin underwater videos. Whilst results are promising, further work is required for useful applicability of predation behaviour detection in field scenarios. In summary, we provide a highly reliable underwater penguin detector, a fish detector, and a valuable first attempt towards an automated visual detection of complex behaviours in a marine predator. We publish the networks, the ’DivingWithPenguins’ video dataset, annotations, splits, and weights for full reproducibility and immediate usability by practitioners.
|Kejia Zhang (virtual)|
The entire workshop was streamed via Zoom (hosted by the University of Jena) and is over now.