Media Summary: I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ...

Sparse Autoencoders Unlearn Knowledge In - Detailed Analysis & Overview

I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ... A visual explanation of how transformers piece concepts together, told in the style of 3Blue1Brown. Introducing SAEs. What truly ... Welcome to AI Safety Poland Talks! ​​A biweekly series where researchers, professionals, and enthusiasts from Poland or ...

The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... Find out how less data can mean more quality, at the inaugural lecture of Professor Pier Luigi Dragotti (Electrical and Electronic ...

Photo Gallery

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough
A Window  Into LLMs | Sparse Autoencoders Explained
What Happened With Sparse Autoencoders?
VISION SPARSE AUTOENCODERS: Overview + Walkthrough of Running an SAE
Sparse Autoencoders: Progress & Limitations with Joshua Engels
Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]
Reading an AI's Mind with Sparse Autoencoders
Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders by Kamil Deja
Sparse Autoencoders Find Highly Interpretable Features in Language Models
UUtah CS 6966 Interpretability of LLMs | Spring 2026 | Sparse autoencoders: Basics
Sparse signal processing: Occam in the age of abundance
View Detailed Profile
Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying

A Window  Into LLMs | Sparse Autoencoders Explained

A Window Into LLMs | Sparse Autoencoders Explained

This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ...

What Happened With Sparse Autoencoders?

What Happened With Sparse Autoencoders?

Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ...

VISION SPARSE AUTOENCODERS: Overview + Walkthrough of Running an SAE

VISION SPARSE AUTOENCODERS: Overview + Walkthrough of Running an SAE

In this video, we explore how Vision

Sparse Autoencoders: Progress & Limitations with Joshua Engels

Sparse Autoencoders: Progress & Limitations with Joshua Engels

In this talk, Joshua Engels discusses

Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]

Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]

One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ...

Reading an AI's Mind with Sparse Autoencoders

Reading an AI's Mind with Sparse Autoencoders

A visual explanation of how transformers piece concepts together, told in the style of 3Blue1Brown. Introducing SAEs. What truly ...

Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal

Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal

Slides: https://jinen.setpal.net/slides/sae.pdf.

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders by Kamil Deja

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders by Kamil Deja

Welcome to AI Safety Poland Talks! ​​A biweekly series where researchers, professionals, and enthusiasts from Poland or ...

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Sparse Autoencoders Find Highly Interpretable Features in Language Models

The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ...

UUtah CS 6966 Interpretability of LLMs | Spring 2026 | Sparse autoencoders: Basics

UUtah CS 6966 Interpretability of LLMs | Spring 2026 | Sparse autoencoders: Basics

Notes: https://drive.google.com/file/d/1GTIqXS-vEiDz2rAPfdeB_5G5IjBfNkxF/view?usp=sharing.

Sparse signal processing: Occam in the age of abundance

Sparse signal processing: Occam in the age of abundance

Find out how less data can mean more quality, at the inaugural lecture of Professor Pier Luigi Dragotti (Electrical and Electronic ...

Steering Language Model Refusal with Sparse Autoencoders

Steering Language Model Refusal with Sparse Autoencoders

The paper explores using