Tianyu Zhou

// building LLMs from scratch to understand them · Shanghai · open to PhD 2026–27
// To democratize cutting-edge knowledge through interactive, narrative-driven AI platforms that bridge theory and practice.

PIE0.1 — 0.2B Dense LLM from Scratch [GitHub]
  • Designed and trained a 0.2B Transformer: custom Rust BPE tokenizer (PyO3), GQA, SwiGLU, RoPE, Flash Attention, bf16 mixed precision.
  • Trained on ~7B tokens across 4 × RTX 4090 via DDP. Data mix: SkyPile 45%, FineWeb 35%, StarCoder 14%, NuminaMath 6%.
  • Diagnosed and disabled MoE after tracing CPU-sync routing bottleneck at this scale; documented decision process across 23 public video episodes.
  • Built custom Triton CUDA kernels; studied KV cache quantization literature (KIVI, QJL, PolarQuant, TurboQuant) for inference optimization.
RustPyTorch DDPFlash Attention TritonGQARoPE
passionie.uk — Interactive AI Textbook ↗
  • Full-stack platform: Next.js frontend, FastAPI backend, Supabase vector DB, RAG pipeline with local llama.cpp inference (Qwen3.5 27B, Q4_K_M).
  • Bilingual (EN/ZH) interactive textbook covering LLM theory; accessible to international readers without authentication.
  • Prototype toward a long-term goal: a teaching agent that models each learner's knowledge state and adapts content presentation dynamically — addressing the bottleneck that written knowledge updates far slower than knowledge is produced.
Next.jsFastAPIRAG llama.cppDocker
General Vision-Language Infrastructure & Interpretability

SJTU Biomedical Engineering · Advisor: Suncheng Xiang

  • Built Med-Vision-Agent: 5-module pipeline (data collection → annotation → YOLO → inference → Qwen3-VL-8B report generation) for automated polyp report generation; converting to patent.
  • Developed a domain-agnostic pipeline where medical polyp datasets serve as a high-precision validation benchmark; framework is designed for seamless adaptation to any vision-to-text task.
  • Contributed to manuscript writing and experimental reproduction; preprint available at arXiv:2512.10750. [arXiv:2512.10750].
  • Designed a controlled MI experiment on a self-trained 30M dense model to isolate when memorization transitions to compositional generalization — a question large labs cannot study without sacrificing interpretability at scale.
Decoupled ArchitectureLoRADPO Vision-Language AlignmentMechanistic Interp.
Douyin / Bilibili — "Handmaking LLMs" Series
  • 23-episode bilingual series covering Transformer internals, BPE tokenization, DDP training, and inference — documented alongside real development of PIE0.1.
  • ~11k followers, 1M+ total views; 800+ member technical community; code cross-referenced with episodes in inline comments.
Teaching Assistant — Deep Learning (High School)

Qingpu High School · co-taught with Suncheng Xiang

  • Designed GUI-based YOLO experiment workflow enabling students with no programming background to run object detection without writing code.