|
|
|
|
Object Categorization
Foundations and Trends® in Computer Graphics and Vision Volume 1 Issue 4 DOI: 10.1561/0600000003
Object Categorization
Axel Pinz
Graz University of Technology, Austria, axel.pinz@tugraz.at
Abstract
This article presents foundations, original research and trends in the field of object categorization by computer vision methods.
The research goals in object categorization are to detect objects in images and to determine the object’s categories. Categorization
aims for the recognition of generic classes of objects, and thus has also been termed ‘generic object recognition’. This is
in contrast to the recognition of specific, individual objects. While humans are usually better in generic than in specific
recognition, categorization is much harder to achieve for today’s computer architectures and algorithms. Major problems are
related to the concept of a ‘visual category’, where a successful recognition algorithm has to manage large intra-class variabilities
versus sometimes marginal inter-class differences. It turns out that several techniques which are useful for specific recognition
can also be adapted to categorization, but there are also a number of recent developments in learning, representation and
detection that are especially tailored to categorization.
Recent results have established various categorization methods that are based on local salient structures in the images. Some
of these methods use just a ‘bag of keypoints’ model. Others include a certain amount of geometric modeling of 2D spatial
relations between parts, or ‘constellations’ of parts. There is now a certain maturity in these approaches and they achieve
excellent recognition results on rather complex image databases. Further work focused on the description of shape and object
contour for categorization is only just emerging. However, there remain a number of important open questions, which also define
current and future research directions. These issues include localization abilities, required supervision, the handling of
many categories, online and incremental learning, and the use of a ‘visual alphabet’, to name a few. These aspects are illustrated
by the discussion of several current approaches, including our own patch-based system and our boundary fragment-model. The
article closes with a summary and a discussion of promising future research directions.
|
|
|
|
|
|
|
|
|