How Interpretability Research Helps Build

How Interpretability Research Helps Build Better Models

From Fully Connected 2023* Join Stella Binderman, Executive Director of EleutherAI and Head of

Check out Ajay Thampi's book

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models ...

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ...

A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ...

Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of https://web.stanford.edu/~cgpotts/blog/interp/ 0:59 ...

This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

Been Kim (Google Brain) https://simons.berkeley.edu/talks/tbd-72 Frontiers of Deep Learning.

Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...

MIT 6.S897 Machine Learning for Healthcare, Spring 2019 Instructor: Peter Szolovits View the complete course: ...

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ...

With a growing interest in

CS 7180: Neural Mechanics Spring 2026 Course at Northeastern University Modern AI systems are powerful but opaque: even ...