Gerard Rinkus
Brandeis University
Hierarchical Sparse Distributed Representations of Sequence Recall and Recognition
Tuesday 21st of February 2006 at 05:00pm
5101 Tolman
There is currently no theory of sequence memory possessing all of the
following properties: a) able to learn sequences with single trials; b) uses
sparse distributed representations for items and sequences; c) can recall
individual sequences as well as recognize novel sequences; d) exhibits
robust invariance to non-uniform time warping; and e) consists of a single,
monolithic substrate in which are stored both the traces of individual
sequences and the higher-order statistics of the set of sequences. I
present a biologically plausible neural network with all of these
properties. The network is hierarchical and its layers are proposed as
analogs of the successive stages of cortical processing, e.g., V1, V2, V4,
IT. Each internal layer is partitioned into a set of winner-take-all
competitive modules (CMs), proposed as analogs of Mountcastle's minicolumn.
Hence, the network uses sparse binary representations (codes) in all layers
on all time steps. A Hebbian rule is used during learning to link coactive
codes in adjacent layers, i.e., top-down (TD) and bottom-up (BU) learning,
and also to link successively active codes in any given layer, i.e.,
horizontal (H) learning. Thus, input sequences are mapped to hierarchical,
distributed spatiotemporal memory traces. A critical and novel assumption
is that representational units (cells) in progressively higher layers have
progressively longer activation durations, or persistences. This yields
temporal chunking in which higher layer codes become associated with
sequences of subjacent codes. This chunking mechanism provides a basis for
the network's robust, memory-based, invariance to non-uniform time-warping.
The fundamental operation, performed by every cell on every time step, is to
compute the level of agreement, or match, between its TD, BU, and H input
vectors. If all three match strongly then the cell will have a high
probability of being chosen winner in its CM. The weaker the match, the
lower the cell's probability of winning. However, the final win
probabilities also depend on a layer-global measure, G, which is the average
of the maximal matches in the layer's CMs. When G is near its maximal value
(1.0), indicating that a layer's current BU input has occurred in the
presence of its current TD and H inputs on some prior occasion, the
cell-local match levels are pushed through a highly expansive sigmoid
nonlinearity that greatly increases the probability that the
maximally-matching cell in each CM wins. As G falls toward zero, indicating
a completely novel situation, the sigmoid is squashed down to a maximally
compressive, constant function, making all cells in a CM equally likely to
win. This G-dependent transform yields the property that the intersection
of two codes increases as a function of the similarity of the spatiotemporal
situations that they represent, which in turn, underlies the model's
recognition capabilities.(video)
Join Email List
You can subscribe to our weekly seminar email list by sending an email to
majordomo@lists.berkeley.edu that contains the words
subscribe redwood in the body of the message.
(Note: The subject line can be arbitrary and will be ignored)