Media Summary: Join us in this session as we dive into "Inference-Time Decomposition of Activations ( Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ... Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al.

Mech Interp Reading Group Itda - Detailed Analysis & Overview

Join us in this session as we dive into "Inference-Time Decomposition of Activations ( Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ... Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al. Join us in this session as we dive into "The Dead Salmons of AI Interpretability" by Maxime Méloux, Giada Dirupo, François Portet, ... Join us in this session as we dive into "Open Problems in Mechanistic interpretability" by Lee Sharkey et al.! Read the article ... This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic interpretability, as I see it (with a focus ...

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ... What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models ... Slides: We covered most of transformer circuits, and will cover ... ERRATA: - Scaling DOES change the composition term. We were wrong about the form of scaling, and we're updating the results ...

Photo Gallery

Mech Interp Reading Group - ITDA: A Scalable Approach to Interpreting Large Language Models
Mech Interp Reading Group - Negation Neglect: When models fail to learn negations in training
Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions
Mech Interp Reading Group -  The Dead Salmons of AI Interpretability
Introduction to Mechanistic Interpretability with David Bau
Mech Interp Reading Group - Open Problems in Mechanistic interpretability
The Story of Mech Interp
An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025
Advances in Data Process Techniques in Geology & RS Applications in Engg Geology by Ms. Jappji Mehar
Interpretability: Understanding how AI models think
Mechanistic Interpretability, Part 1 | ML@P Reading Group | Jinen Setpal
EleutherAI Interpretability Reading Group 220604: Attention head bandwidth connectomes of LLMs.
View Detailed Profile
Mech Interp Reading Group - ITDA: A Scalable Approach to Interpreting Large Language Models

Mech Interp Reading Group - ITDA: A Scalable Approach to Interpreting Large Language Models

Join us in this session as we dive into "Inference-Time Decomposition of Activations (

Mech Interp Reading Group - Negation Neglect: When models fail to learn negations in training

Mech Interp Reading Group - Negation Neglect: When models fail to learn negations in training

Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ...

Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions

Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions

Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al.

Mech Interp Reading Group -  The Dead Salmons of AI Interpretability

Mech Interp Reading Group - The Dead Salmons of AI Interpretability

Join us in this session as we dive into "The Dead Salmons of AI Interpretability" by Maxime Méloux, Giada Dirupo, François Portet, ...

Introduction to Mechanistic Interpretability with David Bau

Introduction to Mechanistic Interpretability with David Bau

CS 7180: Neural

Mech Interp Reading Group - Open Problems in Mechanistic interpretability

Mech Interp Reading Group - Open Problems in Mechanistic interpretability

Join us in this session as we dive into "Open Problems in Mechanistic interpretability" by Lee Sharkey et al.! Read the article ...

The Story of Mech Interp

The Story of Mech Interp

This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic interpretability, as I see it (with a focus ...

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ...

Advances in Data Process Techniques in Geology & RS Applications in Engg Geology by Ms. Jappji Mehar

Advances in Data Process Techniques in Geology & RS Applications in Engg Geology by Ms. Jappji Mehar

IIRS-ISRO.

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models ...

Mechanistic Interpretability, Part 1 | ML@P Reading Group | Jinen Setpal

Mechanistic Interpretability, Part 1 | ML@P Reading Group | Jinen Setpal

Slides: https://cs.purdue.edu/homes/jsetpal/slides/mechinterp.pdf We covered most of transformer circuits, and will cover ...

EleutherAI Interpretability Reading Group 220604: Attention head bandwidth connectomes of LLMs.

EleutherAI Interpretability Reading Group 220604: Attention head bandwidth connectomes of LLMs.

ERRATA: - Scaling DOES change the composition term. We were wrong about the form of scaling, and we're updating the results ...