Developing Visual Hermeneutics for Image Understanding (2013-2015)

Despite recent and important advances in the field of Computer Vision, the task of detecting arbitrary objects and humans together with their interactions in images is still far from being solved. Given an image (or an image sequence) of an action, automatic generation of semantic labels describing the objects involved in it and their relationship to each other is an extremely difficult process. The primary reasons for this difficulty are: (i) definitions of object classes can be extremely fuzzy for objects that have higher semantic associations; (ii) even for simpler object categories, idiosyncratic variability is high and the appearance changes; (iii) for objects that have similar appearances, traditional problems like view changes, illumination effects, occlusions, etc. create problems. In spite of all these difficulties, this problem deserves our full attention, because it holds the key to intelligent systems that can reason about images and scenes, as well as to powerful multimedia search and retrieval systems.

This research project will address this challenge by (i) assuming a ontological knowledge base for visual information that is hierarchically organized with multiple overlapping structures to represent complex semantic relations; (ii) harnessing state-of-the-art techniques in visual object, action and concept detection; and (iii) integrating knowledge of visual interactions at different levels, namely, object, semantic, social, and user interactions. Our targeted deliverables will be open-source, including a database of thousands of action-object instances, and an access interface that achieves real-time performance with the processing power available to regular computational devices