Post

Positional Encodings in Transformers – Types and Comparison

Positional Encodings in Transformers – Types and Comparison

Introduction

Imagine reading a book where every word has been cut out and tossed into a hat. You still have all the words, but the story is gone. This is exactly how a Transformer “sees” language by default.

Unlike Recurrent Neural Networks (RNNs), which process text word-by-word (like a human reading left-to-right), or Convolutional Neural Networks (CNNs), which look at local chunks, Transformers process the entire sequence simultaneously. This makes them incredibly fast, but it leaves them with a peculiar form of amnesia: they have no inherent sense of word order.

To fix this, we use Positional Encodings. These are essentially “positonal details” injected into each word so the model knows not just what the word is, but where it sits in the sentence.

Consider these two sentences:

  • Dog bites man
  • Man bites dog

To a raw Transformer (without Positional Encodings), these sentences are identical because they contain the same tokens. However, they convey significant difference in meaning. Positional encodings ensure the model treats these as distinct structural sequences.


Types of Positional Encodings

1. Sinusoidal Positional Encoding

2. Learned Positional Embeddings

3. Relative Positional Encoding

4. Rotary Positional Embeddings (RoPE)

5. ALiBi (Attention with Linear Biases)


Comparison of Positional Encoding Methods

MethodParametersHandles Long ContextUsed In
SinusoidalNoGoodOriginal Transformer
LearnedYesLimitedBERT
RelativeFewGoodT5, Transformer-XL
RoPENoVery GoodLLaMA, GPT-NeoX
ALiBiNoExcellentLong-context LLMs

References

This post is licensed under CC BY 4.0 by the author.