A New Mapping of Decision Trees to Neural Networks

People often like to produce prediction models from data examples.
There are dozens of ways of doing this, but decision trees and neural
networks are particularly popular.  Neural networks are nice, because
they can be made to generalise remarkably well (if you are very, very
careful).  Unfortunately, they usually have to scan the data examples
hundreds of times to generate a good model.  Decision trees are nice,
because they can be created very quickly.  Unfortunately, they only
model the data by chopping it up into box-shaped regions.

Several authors have proposed that neural networks could be initialised
using decision trees, thus saving on training time.  In this seminar, I
will present a method of doing this which has several points of
difference with those proposed earlier:
- The networks produced have only one hidden layer.
- The input layer of the network observes the data examples directly (no
pre-processing into intervals).
- The network doesn't care if it is looking at categorical or continuous
data, or a mixture of the two.
The resulting networks generalise better than the trees from which they
were produced, and take about an order of magnitude less time to train
to completion.