Discriminative Compact Pyramids for Object and Scene Recognition
Problem Domain
Spatial pyramid scheme proposed by [2] have recently proven very successful results. These are formed by representing an image using weighted multi-resolution histograms, which are obtained by repeatedly sub-dividing an image into increasingly finer sub-regions by doubling the number of divisions in each axis direction and computing histograms of features over the resulting sub-regions. When histograms for all sub-regions at all levels have been created, these histograms are concatenated to form the final image representation. For example, a level 2 spatial pyramid is constructed by concatenating a total of 1 + 4 + 16 = 21 histograms.
Although a notable performance gain is achieved by using the spatial pyramid method, the resulting histogram is often a magnitude higher in dimensionality over its standard bag-of-words based counterpart.
Methodology
Here we test Divisive Information Theoretic feature Clustering (DITC) [3] on the problem of finding a discriminatively compact pyramid image representation [1] on the 15 class scenes recognition data set proposed by [2]. This data set contains 15 different scene categories and 4485 images, divided into 1500 training and 2985 testing.
For constructing the spatial pyramid representation, we use the standard k-means to construct a vocabulary of size 1024 followed by building a three level pyramid using the code provided here Spatial Pyramid code. The histogram size in this case is 21504 times 4485 dimensional histogram.
Results
| Method | Size | Score |
|---|---|---|
| Pyramid | 21000 | 84.1 |
| Pyramid DITC | 5000 | 85.4 |
| Pyramid DITC | 1000 | 84.4 |
| Pyramid_DITC | 500 | 84.2 |
DITC Overview
DITC immediately clusters discrete data into the desired number of clusters during its initialization process. The DITC initialization of the k clusters is obtained by first clustering the words into l clusters, where l is the number of classes. Subsequently, we split each cluster arbitrarily into k/l clusters.
After which it iteratively improves the quality of these clusters. Each iteration monotonically reduces the decline in mutual information between the data (w) and the class labels (c), therefore the algorithm is guaranteed to terminate at a local minimum in a finite number of iterations.
Code Available
Data Available
We also provide the compressed histograms for the data sets used in our paper:
- 15 Scenes data set
- Butterflies data set
- Sports Events data set
- Pascal 2007 data set
- Pascal 2009 data set
References
[1] N. Elfiky, F. Shahbaz Khan, J. van de Weijer, J. Gonzà lez. Discriminative Compact Pyramids for Object and Scene Recognition. Pattern Recognition, doi: 10.1016/j.patcog.2011.09.020, 2011.
[2] S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc.: CVPR, 2006.
[3] I. S. Dhillon, S. Mallela, R. Kumar, I. Guyon, A. Elisseeff. A divisive information-theoretic feature clustering algorithm for text classification. JMLR, 2003.



