Note that “transformer” might refer to either the attention block within a Large Language Model, or it might refer to the whole thing, depending on how it’s used.
GitHub - naklecha/llama3-from-scratch: llama3 implementation one matrix multiplication at a time