PASCAL VOC Segmentation Challenge 2010

CVC Team

Josep M. Gonfaus, Xavier Boix, Fahad Shahbaz Khan, Joost Van de Weijer, Andrew D. Bagdanov, Marco Pedersoli, Joan Serrat,Jordi Gonzàlez.

Slides can be downloaded from here.

Experimental Results

Our method obtained the best score in 11 of the 20 classes, thereby finishing first (also in the final mean result 40.1%). More results can be found at the PASCAL VOC2010 workshop page.

The results show that harmony potentials are able to deal with multiclass images, partial occlusion, and to correctly classify the background.


We use a Conditional Random Field (CRF) approach. The main novelty is the introduction of a new potential, called harmony potential, which allows to encode any combinations of labels at the global node. This improvement is especially relevant for the larger scale regions in the image, and allows us to exploit to results of our image classification method. The method is explained in detail in our CVPR2010 paper.

Our model is a two-level CRF that uses labels, features and classifiers appropriate to each level. The lowest level of nodes represents superpixels labeled with single labels, while a single global node on top of them permits any combination of primitive local node labels. A new consistency potential, which we term the harmony potential, is also introduced which enforces consistency of local label assignment with the label of the global node. We propose an effective sampling strategy for global node labels that renders tractable the underlying optimization problem.

Harmony potential is able to encode combinations of labels in the global node, which can better modelize the semantic cooccurrence of objects in the segmentation.

Improved Local Unary potentials

Superpixel predictions are based on small regions on the image, which tend to produce noisy results. In our second submission we have improved the local predictions by fusing the output of multiple classifiers.

  • Foreground/Background Learning (33%): distinguish between the object and its common background
  • Class-Object Learning (23%): distinguish object and other objects in the dataset
  • Object Detector* (26%): detect object base don the global shape
  • Location Prior (20%): Basically, produces a center bias in the segmentations

All together produces the final results (40%).

An illustrative example


Pedro Felzenszwalb, Ross Girshick, David McAllester, Deva Ramanan. " Object Detection with Discriminatively Trained Part Based Models ", in TPAMI, vol. 32, no. 9, pp. 1627-1645, September, 2010.

Josep M. Gonfaus, Xavier Boix, Joost Van de Weijer, Andrew D. Bagdanov, Joan Serrat, and Jordi Gonzàlez, " Harmony Potentials for Joint Classification and Segmentation ", in Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, 2010.

Lubor Ladicky, Chris Russell, Pushmeet Kohli, and Philip H.S. Torr. Associative hierarchical crfs for object class image segmentation ", in International Conference on Computer Vision (ICCV), Kyoto, Japan, 2009.

Nils Plath, Marc Toussaint, and Shinichi Nakajima. " Multi-class image segmentation using conditional random fields and global classification ", in International Conference on Machine Learning (ICML), Montreal, Canada, 2009.