Build the Transformer Before You Use LangChain

# Build the Transformer Before You Use LangChain

## The Problem
Three lines of LangChain and you have a working RAG demo. When it breaks in production — and it will — the error is inside an abstraction you do not understand.

Common failures people cannot debug:
- "Why is the model hallucinating facts that ARE in my documents?" → Do not understand the attention mechanism
- "Why did my token count blow the context window?" → Do not understand subword tokenisation
- "Why is my fine-tuned model worse than the base?" → Do not understand LoRA gradient dynamics

## What Week 60 Teaches You
Build: scaled dot-product attention → multi-head → encoder block → positional encoding → 6-layer Transformer → masked language modelling.

After this:
- Attention masks are trivial — you wrote the masking code
- The O(n²) context window cost is obvious — you understand why
- Residual connections make sense — you implemented them
- LangChain is no longer magic — it is just orchestrating the architecture you built

## The Roadmap Tradeoff
Phase 3 (Months 12–17) builds the foundations. Phase 4 (Months 18–23) builds on them. Cost: 6 months before using LangChain. Benefit: you use it as an engineer, not as a user.