DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence