Learning Transformers

Now that GPT-4 is taking the world by storm, I’m going to start learning the basics of transformers, so I can understand questions such as “What does model weights mean?”

I’m going to start with a couple of resources:

Transformers from Scratch
Neural Nets: Zero to Hero by Anrej Karpathy
And of course, asking GPT-4 questions!

The Andrej Karpathy videos seem like a good place to start. I’m going to take notes in Obsidian, watch the videos, and as I go come up with fun side projects to work on so I can really internalize this stuff.

A lot of this is FOMO driven but part of it isn’t - I think in the past I didn’t think I had the chops to do work here, but now I’m realizing I’m just a specific type of learner and the field has a lot of math terms that are hostile to outsiders. The gatekeeping on that language is dropping now that I can ask GPT-4 to explain it to me, so going to try this out and see what learning is like nowadays.