Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Text-to-Image

An educational journey through text-to-image generation

Learn how modern AI systems like Stable Diffusion and DALL-E generate images from text prompts by building the key components yourself.

What You’ll Learn

This project teaches the core concepts behind text-to-image generation through 5 progressive phases:

PhaseTopicKey Concepts
1Flow MatchingVelocity fields, noise-to-data paths
2Diffusion TransformerPatchifying, attention, adaLN
3Class ConditioningClassifier-free guidance (CFG)
4Text ConditioningCLIP embeddings, cross-attention
5Latent SpaceVAE compression, scaling up

Getting Started

# Clone and install
git clone https://github.com/zhubert/text-to-image.git
cd text-to-image
uv sync

# Run the first notebook
uv run jupyter notebook notebooks/01_flow_matching_basics.ipynb

Prerequisites

Start with Phase 1: Flow Matching to generate your first images from noise!