Urban Scene Understanding
Team
Sven Sickert, Clemens-Alexander Brust, Marcel Simon, and Erik Rodner
Incorporating Spatial Priors in CNNs
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
Code and dataset are available on GitHub! More information on the dataset can be found in our section on datasets.
Exploitation of Context Cues using Iterative Context Forests
In this paper, we present a new combined approach for feature extraction, classification, and context modelling in an iterative frame- work based on random decision trees and a huge amount of features. A major focus of this paper is to integrate different kinds of feature types like color, geometric context, and auto context features in a joint, flexible and fast manner. Furthermore, we perform an in-depth analysis of multiple feature extraction methods and different feature types. Extensive experiments are performed on challenging facade recognition datasets, where we show that our approach significantly outperforms previous approaches with a performance gain of more than 15% on the most difficult dataset.
The method itself is also suitable for anytime classification scenarios, where the challenge is to estimate a label for each pixel in an image while allowing an in- terruption of the estimation at any time. This offers the application of the introduced method in time-critical tasks, like automotive applications, with limited computational resources unknown in advance. Label estimation is done in an iterative manner and includes spatial context right from the beginning.
Large-scale Gaussian Process Inference for Semantic Segmentation
Semantic interpretation and understanding of images is an important goal of visual recognition research and offers a large variety of possible applications. One step towards this goal is semantic segmentation, which aims for automatic labeling of image regions and pixels with category names. Since usual images contain several millions of pixel, the use of kernel-based methods for the task of semantic segmentation is limited due to the involved computation times. In this paper, we overcome this drawback by exploiting efficient kernel calculations using the histogram intersection kernel for fast and exact Gaussian process classification. Our results show that non-parametric Bayesian methods can be utilized for semantic segmentation without sparse approximation techniques. Furthermore, in experiments, we show a significant benefit in terms of classification accuracy compared to state-of-the-art methods.
Publications
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding.
International Conference on Computer Vision Theory and Applications (VISAPP). Pages 510-517. 2015.
[bibtex] [pdf] [doi] [abstract]
Efficient Convolutional Patch Networks for Scene Understanding.
CVPR Workshop on Scene Understanding (CVPR-WS). 2015. Poster presentation and extended abstract
[bibtex] [pdf] [abstract]