Consistency Large Language Models: A Family of Efficient Parallel Decoders

May 11, 2024

News

An illustration for an article on Consistency Large Language Models (CLLMs). The image consists of a futuristic lab setting with a framed 3:2 ratio picture of several virtual, computer generated decoders illustrated in a cheerful and positive light. These decoder tokens are in the shape of different characters, each linked with parallel lines demonstrating parallel decoding. Some are solving large non-linear equations representing the Jacobi decoding method. Others show various process states symbolizing global consistency (GC) loss, local consistency (LC) loss, and traditional AR loss. Additionally, some tokens appear to predict their evolution, underlining the capability of CLLMs to predict correct tokens preemptively.

The document introduces Consistency Large Language Models (CLLMs), a new family of parallel decoders that can efficiently decode an n-token sequence per inference step, reducing latency. It explains that CLLMs are trained to perform parallel decoding by mapping any randomly initialized n-token sequence to the same result yielded by autoregressive (AR) decoding in as few steps as possible. The proposed method shows significant improvements in generation speed, comparable to other fast inference techniques like Medusa2 and Eagle, without requiring additional memory cost. The Jacobi decoding method is discussed, which transforms the sequential generation process into a system of n non-linear equations solvable in parallel. The document also details the training process for CLLMs, including global consistency (GC) loss, local consistency (LC) loss, and traditional AR loss. It highlights that CLLMs achieve significant speedup in specialized domains and open-domain conversational challenges, with moderate fine-tuning costs. Additionally, CLLMs exhibit the capability of predicting correct tokens preemptively and acquire proficiency in numerous collocations through the consistency generation objective.

Full article

Consistency Large Language Models: A Family of Efficient Parallel Decoders

Related Posts

Leave a Reply Cancel reply

Consistency Large Language Models: A Family of Efficient Parallel Decoders

Related Posts

Leave a Reply Cancel reply

Related Articles