Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

An Introduction to Transformers

Understanding Transformers from the Inside Out

This book teaches you how fairly modern AI systems work by building miniature versions of them yourself. I don’t want to hand wave anything, because I’m learning this as we go too. Real math, straightforward code...that’s the goal.

Understanding Gradients

Using only basic Python (no NumPy, no PyTorch), we’ll compute every matrix multiplication, every activation function, every gradient. If you want to be pragmatic, you can skip this one and go to the next section. But if you want to reach for glory which is meticulous mathematical matrix multiplications, then get ready to calculate!


Building a Transformer

This is our transformer. There are many like it, but this one is ours. This section shows you how to build a complete GPT-style transformer in PyTorch. All that heavy lifting we did in the last section is now hidden behind simple backward()-like calls. It covers the architecture that powers modern language models (circa 2023), from embeddings to interpretability tools. In the end, you’ll have a new toy.


Fine-Tuning a Transformer

Fine-tuning really should be called “necessary tuning” because the output of the previous section doesn’t look anything like the GPT-style assistants we are used to. As such, this section teaches a baseline pre-trained model to follow instructions. We go into detail on SFT, reward modeling, RLHF with PPO, DPO, and other acronyms we will explain later, the techniques that turn base models into safer assistants.


Reasoning with Transformers

How do models like o1 and DeepSeek-R1 “think”? This section covers the techniques that make transformers reason: from simple prompting tricks to full reinforcement learning pipelines. We’ll build chain-of-thought, tree search, and train our own reasoning models.


From Noise to Images

But what if we aren’t generating text? Here we will learn how AI generates images from text prompts. This section builds from flow matching fundamentals to a working latent diffusion model (you’ll know what that means later).