Media Summary: Join us in this session as we dive into "Inference-Time Decomposition of Activations ( Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ... Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al.
Mech Interp Reading Group Itda - Detailed Analysis & Overview
Join us in this session as we dive into "Inference-Time Decomposition of Activations ( Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ... Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al. Join us in this session as we dive into "The Dead Salmons of AI Interpretability" by Maxime Méloux, Giada Dirupo, François Portet, ... Join us in this session as we dive into "Open Problems in Mechanistic interpretability" by Lee Sharkey et al.! Read the article ... This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic interpretability, as I see it (with a focus ...
How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ... What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models ... Slides: We covered most of transformer circuits, and will cover ... ERRATA: - Scaling DOES change the composition term. We were wrong about the form of scaling, and we're updating the results ...