Media Summary: Join us in this session as we dive into " Join us in this session as we dive into "Circuit Join us in this session as we dive into "Learning a Generative Meta-Model of LLM Activations" by Grace Luo, Jiahai Feng, Trevor ...
Mech Interp Reading Group Tracing - Detailed Analysis & Overview
Join us in this session as we dive into " Join us in this session as we dive into "Circuit Join us in this session as we dive into "Learning a Generative Meta-Model of LLM Activations" by Grace Luo, Jiahai Feng, Trevor ... Join us in this session as we dive into "Open Problems in Mechanistic interpretability" by Lee Sharkey et al.! Read the article ... From PhD research on grounding and language models to shipping interpretability tools in production at *Goodfire,* *Jack ... Join us in this session as we dive into "Negation Neglect: When models fail to learn negations in training" by Harry Mayne, Lev ...
Join us in this session as we dive into "The Dead Salmons of AI Interpretability" by Maxime Méloux, Giada Dirupo, François Portet, ... ERRATA: - Scaling DOES change the composition term. We were wrong about the form of scaling, and we're updating the results ... We are happy to welcome the next round of our AI paper Slides: We covered most of transformer circuits, and will cover ... This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic interpretability, as I see it (with a focus ... Paper Link: Most recent ML models in reaction prediction often fail to ...