Motifs in Attention Patterns of Large Language Models

This project investigates structure in the attention mechanisms of large language models. A lot of papers in interpretability involve manually inspecting attention patterns once heads of interest are identified. We aim to improve this process by providing a way to systematically embed attention patterns in a meaningful latent space, which we in turn use to embed the heads that produce them. We provide a suite of interactive tools to allow inspecting patterns produced by different heads, finding heads with similar patterns, looking through the embedding spaces, and looking at known classes of heads.

AttentionPedia

AttentionPedia

The central hub for exploring attention heads. Find similar heads, view patterns side-by-side, and navigate classifications.

Pattern Embedding Visualization

Pattern Embedding Visualization

Explore the 3D space of attention patterns. See how individual patterns relate to each other in the learned embedding.

Head Embedding Table

Head Embedding Table

Various embeddings of attention heads, with known classifications.

Clustering

Dendrogram and grid view of cluster assignments. Supports hierarchical, HDBSCAN, and Leiden clustering methods.

Cluster Trends

Cluster composition trends across models and layers. Analyze how clusters distribute across model families.

Ablation Interface

Explore ablation results for attention heads. Compare loss impact across heads, clusters, and model families.

Pipeline

1. Pattern Extraction

Generate attention patterns from multiple language models using diverse text prompts.

Tools: Pattern Lens Single Pattern View

2. Feature Extraction

Compute handcrafted features from each attention pattern.

Figures: Full Covariance Reduced Covariance

3. Feature Analysis

Normalization and principal component analysis of the table of features.

Tools: Pattern Embedding Visualization

Figures: PCA Analysis PCA Overview

4. Distances between heads

We now have a point cloud in PCA space for each head -- we can compute distances between each cloud to get a distance matrix.

Tools: AttentionPedia

Figures: Head Distances Heatmap

5. Embeddings of heads

Use the distance matrix to get embeddings of heads in a meaningful latent space.

Tools: Head Embedding Visualization Head Embedding Table Classifications Page

Figures: Head Embeddings

6. Clustering of heads

Cluster heads using hierarchical, HDBSCAN, and Leiden methods on the head distance matrix. Analyze cluster composition across models and layers.

Tools: Clustering Visualization Cluster Trends

7. Ablation

Zero-ablate individual attention heads and measure the impact on model loss across diverse prompts.

Tools: Ablation Interface