HOME MISSION AND RESEARCH PUBLICATIONS HISTORY PEOPLE SEMINARS COURSES VIDEO ARCHIVE CONTACT

Gerard Rinkus
Brandeis University

Hierarchical Sparse Distributed Representations of Sequence Recall and Recognition

Tuesday 21st of February 2006 at 05:00pm
5101 Tolman

There is currently no theory of sequence memory possessing all of the following properties: a) able to learn sequences with single trials; b) uses sparse distributed representations for items and sequences; c) can recall individual sequences as well as recognize novel sequences; d) exhibits robust invariance to non-uniform time warping; and e) consists of a single, monolithic substrate in which are stored both the traces of individual sequences and the higher-order statistics of the set of sequences. I present a biologically plausible neural network with all of these properties. The network is hierarchical and its layers are proposed as analogs of the successive stages of cortical processing, e.g., V1, V2, V4, IT. Each internal layer is partitioned into a set of winner-take-all competitive modules (CMs), proposed as analogs of Mountcastle's minicolumn. Hence, the network uses sparse binary representations (codes) in all layers on all time steps. A Hebbian rule is used during learning to link coactive codes in adjacent layers, i.e., top-down (TD) and bottom-up (BU) learning, and also to link successively active codes in any given layer, i.e., horizontal (H) learning. Thus, input sequences are mapped to hierarchical, distributed spatiotemporal memory traces. A critical and novel assumption is that representational units (cells) in progressively higher layers have progressively longer activation durations, or persistences. This yields temporal chunking in which higher layer codes become associated with sequences of subjacent codes. This chunking mechanism provides a basis for the network's robust, memory-based, invariance to non-uniform time-warping. The fundamental operation, performed by every cell on every time step, is to compute the level of agreement, or match, between its TD, BU, and H input vectors. If all three match strongly then the cell will have a high probability of being chosen winner in its CM. The weaker the match, the lower the cell's probability of winning. However, the final win probabilities also depend on a layer-global measure, G, which is the average of the maximal matches in the layer's CMs. When G is near its maximal value (1.0), indicating that a layer's current BU input has occurred in the presence of its current TD and H inputs on some prior occasion, the cell-local match levels are pushed through a highly expansive sigmoid nonlinearity that greatly increases the probability that the maximally-matching cell in each CM wins. As G falls toward zero, indicating a completely novel situation, the sigmoid is squashed down to a maximally compressive, constant function, making all cells in a CM equally likely to win. This G-dependent transform yields the property that the intersection of two codes increases as a function of the similarity of the spatiotemporal situations that they represent, which in turn, underlies the model's recognition capabilities.
(video)


Join Email List

You can subscribe to our weekly seminar email list by sending an email to majordomo@lists.berkeley.edu that contains the words subscribe redwood in the body of the message.
(Note: The subject line can be arbitrary and will be ignored)