Motifs in Attention Patterns of Large Language Models

Paper GitHub Website Twitter/X

This project investigates structure in the attention mechanisms of large language models. A lot of papers in interpretability involve manually inspecting attention patterns once heads of interest are identified. We aim to improve this process by providing a way to systematically embed attention patterns in a meaningful latent space, which we in turn use to embed the heads that produce them. We provide a suite of interactive tools to allow inspecting patterns produced by different heads, finding heads with similar patterns, looking through the embedding spaces, and looking at known classes of heads.

AttentionPedia

The central hub for exploring attention heads. Find similar heads, view patterns side-by-side, and navigate classifications.

Pattern Embedding Visualization

Explore the 3D space of attention patterns. See how individual patterns relate to each other in the learned embedding. WASD to move, click+mouse to look around. left click to select, right click to open pattern, middle mouse button to bring up pattern view.

Head Embedding Table

Various embeddings of attention heads, with known classifications.

Interface Map

The diagram below shows how all the analysis tools connect together. Click on any component to explore that interface.

Existing connections

Planned connections

Pipeline

1. Pattern Extraction

Generate attention patterns from multiple language models using diverse text prompts.

Tools: Pattern Lens Single Pattern View

2. Feature Extraction

Compute handcrafted features from each attention pattern.

Figures: Full Covariance Reduced Covariance

3. Feature Analysis

Normalization and principal component analysis of the table of features.

Tools: Pattern Embedding Visualization

Figures: PCA Analysis PCA Overview

4. Distances between heads

We now have a point cloud in PCA space for each head -- we can compute distances between each cloud to get a distance matrix.

Tools: AttentionPedia

Figures: Head Distances Heatmap

5. Embeddings of heads

Use the distance matrix to get embeddings of heads in a meaningful latent space.

Tools: Head Embedding Visualization Head Embedding Table Classifications Page

Figures: Head Embeddings