Mech Interp Reading Group Learning

Media Summary: Join us in this session as we dive into " Join us in this session as we dive into "Subliminal Join us in this session as we dive into "Towards Understanding Subliminal

Mech Interp Reading Group Learning - Detailed Analysis & Overview

Join us in this session as we dive into " Join us in this session as we dive into "Subliminal Join us in this session as we dive into "Towards Understanding Subliminal Join us in this session as we dive into "There Will Be a Scientific Theory of Deep Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al. Join us in this session as we dive into "Do Sparse Autoencoders Capture Concept Manifolds?" by Usha Bhalla, Thomas Fel, Can ...

Join us in this session as we dive into "Attribution-based Parameter Decomposition" by Dan Braun, Lucius Bushnaq, Stefan ... Join us in this session as we dive into "In-Context Algebra" by Eric Todd, Jannik Brinkmann, Rohit Gandikota, and David Bau! Join us in this session as we dive into "Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable ... Join us in this session as we dive into "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic ... Join us in this session as we dive into "Eliciting Secret Knowledge from Language Models" by Bartosz Cywiński, Emil Ryd, Rowan ... Join us in this session as we dive into "Beyond Linear Probes: Dynamic Safety Monitoring for Language Models" by James ...

Photo Gallery

Mech Interp Reading Group - Learning a Generative Meta-Model of LLM Activations

Mech Interp Reading Group - Subliminal Learning Is Steering Vector Distillation

Mech Interp Reading Group - Subliminal Learning: LLMs transmit behavioral traits via hidden signals

Mech Interp Reading Group - Towards Understanding Subliminal Learning: Hidden Biases Transfer

Mech Interp Reading Group - There Will Be a Scientific Theory of Deep Learning

Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions

Mech Interp Reading Group - Do Sparse Autoencoders Capture Concept Manifolds?

Mech Interp Reading Group - Attribution-based Parameter Decomposition

Mech Interp Reading Group - In-Context Algebra

Mech Interp Reading Group - Formal Mech Interp: Automated Circuit Discovery with Provable Guarantees

Mech Interp Reading Group - The Non-Linear Representation Dilemma: Is Causal Abstraction Enough?

Mech Interp Reading Group - Eliciting Secret Knowledge from Language Models

View Detailed Profile

Mech Interp Reading Group - Learning a Generative Meta-Model of LLM Activations

Mech Interp Reading Group - Learning a Generative Meta-Model of LLM Activations

Join us in this session as we dive into "

Mech Interp Reading Group - Subliminal Learning Is Steering Vector Distillation

Mech Interp Reading Group - Subliminal Learning Is Steering Vector Distillation

Join us in this session as we dive into "Subliminal

Mech Interp Reading Group - Subliminal Learning: LLMs transmit behavioral traits via hidden signals

Mech Interp Reading Group - Subliminal Learning: LLMs transmit behavioral traits via hidden signals

Join us in this session as we dive into "Subliminal

Mech Interp Reading Group - Towards Understanding Subliminal Learning: Hidden Biases Transfer

Mech Interp Reading Group - Towards Understanding Subliminal Learning: Hidden Biases Transfer

Join us in this session as we dive into "Towards Understanding Subliminal

Mech Interp Reading Group - There Will Be a Scientific Theory of Deep Learning

Mech Interp Reading Group - There Will Be a Scientific Theory of Deep Learning

Join us in this session as we dive into "There Will Be a Scientific Theory of Deep

Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions

Mech Interp Reading Group - Tracing Attention Computation Through Feature Interactions

Join us in this session as we dive into "Tracing Attention Computation Through Feature Interactions" by Harish Kamath et al.

Mech Interp Reading Group - Do Sparse Autoencoders Capture Concept Manifolds?

Mech Interp Reading Group - Do Sparse Autoencoders Capture Concept Manifolds?

Join us in this session as we dive into "Do Sparse Autoencoders Capture Concept Manifolds?" by Usha Bhalla, Thomas Fel, Can ...

Mech Interp Reading Group - Attribution-based Parameter Decomposition

Mech Interp Reading Group - Attribution-based Parameter Decomposition

Join us in this session as we dive into "Attribution-based Parameter Decomposition" by Dan Braun, Lucius Bushnaq, Stefan ...

Mech Interp Reading Group - In-Context Algebra

Mech Interp Reading Group - In-Context Algebra

Join us in this session as we dive into "In-Context Algebra" by Eric Todd, Jannik Brinkmann, Rohit Gandikota, and David Bau!

Mech Interp Reading Group - Formal Mech Interp: Automated Circuit Discovery with Provable Guarantees

Mech Interp Reading Group - Formal Mech Interp: Automated Circuit Discovery with Provable Guarantees

Join us in this session as we dive into "Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable ...

Mech Interp Reading Group - The Non-Linear Representation Dilemma: Is Causal Abstraction Enough?

Mech Interp Reading Group - The Non-Linear Representation Dilemma: Is Causal Abstraction Enough?

Join us in this session as we dive into "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic ...

Mech Interp Reading Group - Eliciting Secret Knowledge from Language Models

Mech Interp Reading Group - Eliciting Secret Knowledge from Language Models

Join us in this session as we dive into "Eliciting Secret Knowledge from Language Models" by Bartosz Cywiński, Emil Ryd, Rowan ...

Mech Interp Reading Group - Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

Mech Interp Reading Group - Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

Join us in this session as we dive into "Beyond Linear Probes: Dynamic Safety Monitoring for Language Models" by James ...