Mechanistic Interpretability

Mechanistic Interpretability Hub

What does a model actually compute when it predicts the next token? This hub maps the answer, through feature decomposition, activation patching, and circuit-level interventions on real models. Theory only counts when it runs.

Writing

Videos

Coming soon — check back after May 2026

Resources

Coming soon — check back after May 2026