Learning Transformers
Now that GPT-4 is taking the world by storm, I’m going to start learning the basics of transformers, so I can understand questions such as “What does model weights mean?”
I’m going to start with a couple of resources:
- Transformers from Scratch
- Neural Nets: Zero to Hero by Anrej Karpathy
- And of course, asking GPT-4 questions!
The Andrej Karpathy videos seem like a good place to start. I’m going to take notes in Obsidian, watch the videos, and as I go come up with fun side projects to work on so I can really internalize this stuff.
A lot of this is FOMO driven but part of it isn’t - I think in the past I didn’t think I had the chops to do work here, but now I’m realizing I’m just a specific type of learner and the field has a lot of math terms that are hostile to outsiders. The gatekeeping on that language is dropping now that I can ask GPT-4 to explain it to me, so going to try this out and see what learning is like nowadays.