Non-linear feature extraction from image data in neural networks

	Non-linear feature extraction from image data in neural networks

D. de Ridder, R.P.W. Duin

Goal

The goal of this project is to investigate the feature extraction capabilities of neural networks. Traditionally, activities to solve a pattern recognition task were twofold. First, a set of features had to be found describing the object(s) being classified. Only after a set of features had been found, a classification mechanism would be chosen and optimized. These two steps are highly interdependent, since the choice of features influences the conditions under which a classifiers operates and vice versa.

With the advent of neural networks however, more and more problems are solved by simply feeding large amounts of 'raw data' (e.g. images, sound signals, stock market index ranges) to a neural network. During training, the network learns what value to place on what feature. The exact nature of this feature extraction process is, however, not clear due to the vast number of interconnections and thus weights in neural networks.

The goal of this project therefore is to study the (non-linear) feature extraction processes taking place in neural networks, putting special emphasis on using image data. Comparisons can be made between man- and machine-generated filters or templates in different tasks.

Research

Shared weights networks

In the first phase, the network architectures as proposed by Le Cun et al. - 2-dimensional Time Delay Neural Networks, networks using shared weights - have been studied. Three of these architectures were tested on both the handwrittend digit recognition problems (using a NIST database) and on several artificial datasets containing simple images.

This resulted in D. de Ridder's M.Sc. thesis. The main conclusion of this thesis is, that while this type of network performs well, it is still outperformed by the traditional 1-nearest neighbour method and large non-restrained feed-forward neural networks. In light of this fact, it seems that the claim of many researchers that a network is a local shift-invariant feature extraction mechanism since it performs well:

  local shift-invariant feature extraction <-> good performance

is not valid. The 1-nearest neighbour method performs better, yet it clearly is not extracting features. Therefore:

  local shift-invariant feature extraction -> good performance
  good performance -/-> local shift-invariant feature extraction

Furthermore, an attempt was made to study feature extraction using artificially generated images. It was shown that a trade-off exists between the trainability of a simple network and its understandability: the larger the network, the easier to train but the harder to understand.

Image filtering

It was investigated how well feed-forward networks can work in image filtering. Given a standard non-linear image filtering operation, the Kuwahara filter, a regression feed-forward network was trained to generate the desired output given input consisting of the pixels in a small window in the original image. A problem was that these networks had a tendency to get stuck in a linear approximation to the filter. This was shown to be due to the use of the mean squared error (MSE) as a training criterion, which treats many small errors the same as a small number of serious errors. A new performance measure for edge-preserving smoothing was developed and applied to highlight differences between networks. Also, incorporating prior knowledge about the operation to be performed by constructing a modular network increased performance to a large extent. The conclusion was that neural networks can be useful as adaptive filters, but that great care has to be taken in choosing the network architecture and training algorithm.

Feature extraction

We studied adaptive feature extraction mechanisms, especially mixture models of subspaces. As image content can locally be described in a number of parameters far smaller than the number of dimensions (pixels), a subspace approach can be used to model this local image content. To describe an entire image adaptively mixtures-of-subspaces can be used. A method, the adaptive subspace map (ASM), was proposed and applied to texture segmentation, object recognition and image database retrieval using PCA subspaces. In an extension of this work, a probabilistic mixture model of ICA subspaces was developed and applied to natural texture segmentation and shown to give results superior to PCA models.

Publications

Click here for an overview of work published in this project.

e-mail: dick@ph.tn.tudelft.nl

Last update: October 23, 2000

Return to the home page