A subfield within AI Alignment.
I believe the premise is that we want to understand what Artificial Neural Networks are actually doing by inspecting their weights.
An example of this would be looking at the representation of nodes in a CNN.
Mechanistic Interpretability Quickstart Guide — Neel Nanda
Described to me by ‣.