Mechanistic Interpretability Neel Nanda Deepmind

Media Summary: Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

Mechanistic Interpretability Neel Nanda Deepmind - Detailed Analysis & Overview

Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? We don't know how AIs think or why they do what they do. Or at least, we don't know much. That fact is only becoming more ... Art by Clipped from episode 19 of AXRP: Transcript of that episode: ... SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide ...

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 training program about reasoning model This is a talk I gave to my MATS scholars, with a stylised history of the field of

Photo Gallery

Mechanistic Interpretability - NEEL NANDA (DeepMind)

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

I lead a Google DeepMind team at 26. If you want to work at an AI company... | Neel Nanda (Part 2)

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

What Matters Right Now In Mechanistic Interpretability?

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

What is mechanistic interpretability? Neel Nanda explains.

NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

How Reasoning Models Break Mechanistic Interpretability Techniques

The Story of Mech Interp

View Detailed Profile

Mechanistic Interpretability - NEEL NANDA (DeepMind)

Mechanistic Interpretability - NEEL NANDA (DeepMind)

http://80000hours.org/mlst Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ...

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to

I lead a Google DeepMind team at 26. If you want to work at an AI company... | Neel Nanda (Part 2)

I lead a Google DeepMind team at 26. If you want to work at an AI company... | Neel Nanda (Part 2)

Most remarkably, he ended up running

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda

What Matters Right Now In Mechanistic Interpretability?

What Matters Right Now In Mechanistic Interpretability?

This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

We don't know how AIs think or why they do what they do. Or at least, we don't know much. That fact is only becoming more ...

What is mechanistic interpretability? Neel Nanda explains.

What is mechanistic interpretability? Neel Nanda explains.

Art by @hamishdoodles Clipped from episode 19 of AXRP: https://youtu.be/3YbE7zybc5k?t=64 Transcript of that episode: ...

NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide ...

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ...

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda

How Reasoning Models Break Mechanistic Interpretability Techniques

How Reasoning Models Break Mechanistic Interpretability Techniques

A talk I gave to my MATS 9.0 training program about reasoning model

The Story of Mech Interp

The Story of Mech Interp

This is a talk I gave to my MATS scholars, with a stylised history of the field of

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Neel Nanda