Media Summary: How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? This is a talk I gave to my MATS scholars, with a stylised history of the field of
Neel Nanda Mechanistic Interpretability A - Detailed Analysis & Overview
How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? This is a talk I gave to my MATS scholars, with a stylised history of the field of Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... Part 1 of a walkthrough of our paper, Progress Measures for Grokking via Art by Clipped from episode 19 of AXRP: Transcript of that episode: ...
When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 training program about reasoning model Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at ...