PhD Thesis

Learning to Recognise 3D Objects from 2D Intensity Images

The problem of three dimensional object recognition is one which has been studied extensively by computer vision researchers over the previous 30 years. Recently, significant research effort has concentrated on solving the 3D object recognition problem using dense range data. However, range data is of limited usefulness over large distances and in outdoor settings, and the human visual system does not make use of this type of data. Alternatively, some work has been done on recognising 3D objects from 2D intensity images, but of the reported systems, many use only a limited database of one or two objects, others utilise ``perfect'' or synthetic input data (eg line drawings obtained from CAD packages), few systems attempt to learn object models, and the problem of recognising objects in complex scenes is often neglected. Also, considerable research has been reported on techniques which extract sparse 3D depth data from 2D images (depth-from-X), but no system incorporates this data into a working object recognition system.

This dissertation describes a 3D object recognition system that learns to recognise 3D objects from 2D intensity images. Input images are segmented using a novel segmentation algorithm, and depth data is extracted using a novel stereo algorithm. Attributes (including depth attributes) are extracted from each of the parts produced by the segmentation algorithm, and these are used by the Fuzzy Conditional Rule Generation (FCRG) machine learning classifier to either learn object models in the training samples, or to classify object models in test scenes containing one or more objects. A final hypothesis verification routine is used to verify the classifications produced by FCRG.

The system has been demonstrated using a database of 18 non-trivial objects, for which a classification rate of up to 78% was achieved. This dissertation demonstrates that learning to recognise 3D objects from 2D intensity images is a viable alternative to recognising objects from dense range data, that sparse range data obtained from depth-from-X techniques (in this case, depth-from-stereo) can be used to aid the recognition process, and that it is possible to recognise objects in complex scenes without prior object partitioning.


Postrcipt Source


Related Code