How To Fail Interpretability Research

How to Fail Interpretability Research

Been Kim (Google Brain) https://simons.berkeley.edu/talks/tba-90 Emerging Challenges in Deep Learning.

A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ...

Been Kim (Google Brain) https://simons.berkeley.edu/talks/tbd-72 Frontiers of Deep Learning.

A talk I gave to my MATS 9.0 training program about reasoning model

Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ...

MLHC 2022 - Been Kim: Don't do it Emmanuel! How to stop worrying about

MIT 6.S897 Machine Learning for Healthcare, Spring 2019 Instructor: Peter Szolovits View the complete course: ...

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ...

Check out Gradient now and redeem your free 5$ credits! https://gradient.1stcollab.com/bycloud Solving AI Doomerism: ...

Shop the new merch! - shoptensor.com Diving deep into the fascinating world of mechanistic

This talk was recorded at NDC AI in Oslo, Norway. #ndcai #ndcconferences #developer #softwaredeveloper Attend the next NDC ...