A subfield within AI Alignment.

I believe the premise is that we want to understand what Artificial Neural Networks are actually doing by inspecting their weights.

An example of this would be looking at the representation of nodes in a CNN.

Links

Mechanistic Interpretability Quickstart Guide — Neel Nanda

People

Described to me by ‣.