Positional Encodings in Transformers – Types and Comparison

Posted Mar 4, 2026 Updated Mar 4, 2026

1 min read

Introduction

Imagine reading a book where every word has been cut out and tossed into a hat. You still have all the words, but the story is gone. This is exactly how a Transformer “sees” language by default.

Unlike Recurrent Neural Networks (RNNs), which process text word-by-word (like a human reading left-to-right), or Convolutional Neural Networks (CNNs), which look at local chunks, Transformers process the entire sequence simultaneously. This makes them incredibly fast, but it leaves them with a peculiar form of amnesia: they have no inherent sense of word order.

To fix this, we use Positional Encodings. These are essentially “positonal details” injected into each word so the model knows not just what the word is, but where it sits in the sentence.

Consider these two sentences:

Dog bites man
Man bites dog

To a raw Transformer (without Positional Encodings), these sentences are identical because they contain the same tokens. However, they convey significant difference in meaning. Positional encodings ensure the model treats these as distinct structural sequences.

Types of Positional Encodings

1. Sinusoidal Positional Encoding

2. Learned Positional Embeddings

3. Relative Positional Encoding

4. Rotary Positional Embeddings (RoPE)

5. ALiBi (Attention with Linear Biases)

Comparison of Positional Encoding Methods

Method	Parameters	Handles Long Context	Used In
Sinusoidal	No	Good	Original Transformer
Learned	Yes	Limited	BERT
Relative	Few	Good	T5, Transformer-XL
RoPE	No	Very Good	LLaMA, GPT-NeoX
ALiBi	No	Excellent	Long-context LLMs

References

Transformers, PositionalEncoding, LLM

This post is licensed under CC BY 4.0 by the author.