Oriol Vinyals

Machine Translation with Long-Short Term Memory Models

Tuesday 02nd of September 2014 at 11:00am
560 Evans

Supervised large deep neural networks achieved good results on speech recognition and computer vision. Although very successful, deep neural networks can only be applied to problems whose inputs and outputs can be conveniently encoded with vectors of fixed dimensionality - but cannot easily be applied to problems whose inputs and outputs are sequences. In this work, we show how to use a large deep Long Short-Term Memory (LSTM) model to solve domain-agnostic supervised sequence to sequence problems with minimal manual engineering. Our model uses one LSTM to map the input sequence to a vector of a fixed dimensionality and another LSTM to map the vector to the output sequence. We applied our model to a machine translation task and achieved encouraging results. On the WMT'14 translation task from English to French, a model combination of 6 large LSTMs achieves a BLEU score of 32.3 (where a larger score is better). For comparison, a strong standard statistical MT baseline achieves a BLEU score of 33.3. When we use our LSTM to rescore the n-best lists produced by the SMT baseline, we achieve a BLEU score of 36.3, which is a new state of the art. This is joint work with Ilya Sutskever and Quoc Le.

Join Email List

You can subscribe to our weekly seminar email list by sending an email to majordomo@lists.berkeley.edu that contains the words subscribe redwood in the body of the message.
(Note: The subject line can be arbitrary and will be ignored)