The one where noobs tried mech interpretabilty
Contains the method and results of running inputs from two distinct domains (C++ and philosophy, in this case) through the model. It analyzes and visualizes the contribution of each layer to the residual stream by comparing activations, helping identify how different layers respond to domain-specific inputs.