Media Summary: I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ...
Sparse Autoencoders Unlearn Knowledge In - Detailed Analysis & Overview
I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ... A visual explanation of how transformers piece concepts together, told in the style of 3Blue1Brown. Introducing SAEs. What truly ... Welcome to AI Safety Poland Talks! A biweekly series where researchers, professionals, and enthusiasts from Poland or ...
The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... Find out how less data can mean more quality, at the inaugural lecture of Professor Pier Luigi Dragotti (Electrical and Electronic ...