Media Summary: The paper explores using sparse autoencoders to steer Modify the behavior or the personality of a In this AI Research Roundup episode, Alex discusses the paper: 'What Drives Representation
Steering Language Model Refusal With - Detailed Analysis & Overview
The paper explores using sparse autoencoders to steer Modify the behavior or the personality of a In this AI Research Roundup episode, Alex discusses the paper: 'What Drives Representation The paper introduces Conditional Activation Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ... Alessandro Stolfo, PhD Candidate at ETH Zürich and Doctoral Fellow at the Swiss Cyber-Defence (CYD) Campus Abstract: ...