Media Summary: The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment:
Deceptive Misaligned Mesa Optimisers It - Detailed Analysis & Overview
The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment: From my ongoing curation of interesting materials. This is a notebookLM generated podcast. If you enjoyed this content and my ... Nino Scherrer, a research scientist at Google, presented recent work on understanding