Hi, I'm

Focused on video generation, image generation, and multimodal AI research. Also a passionate content creator making AI-powered short films.

Full-Stack Builder🧠AI 研究者🎬内容创作者

AI Researcher × Full-Stack Builder × Content Creator

0+
Projects
0+
Years Coding
0+
GitHub Stars
0+
Articles

Career Direction

AI Engineer / Machine Learning Engineer

Research Areas

Video Models & GenerationImage GenerationTemporal ModelsMultimodal Learning

Triple Identity

Full-Stack Builder

Product design, algorithm R&D, engineering, testing & deployment — end-to-end capability

AI Researcher

Deep dive into video generation and multimodal domains, tracking cutting-edge papers

Content Creator

Directing, filming & editing — creating AI short films and cinematic driving footage

Cross-disciplinary skill stack

Current Skill Positioning

Full-Stack Builder focused on LLM fine-tuning, agent systems, RAG architecture, and production-oriented backend delivery, differentiated by causal inference and measurement skills.

LangGraphMCP ProtocolGraphRAGQLoRAGRPO / DPOFastAPICausal Inference

LLM & GenAI Engineering

Core strengths around model integration, fine-tuning, alignment, and inference optimization.

OpenAI / Claude / DeepSeek / Gemini / Qwen APIsvLLMSGLangLoRA / QLoRA fine-tuningGRPO / DPO alignmentUnslothLLaMA-FactoryKV Cache optimizationFlash AttentionMoE architecturesDeepSeek V3 / R1 techniquesHuggingFace Transformers

Agent Systems

Product-oriented agent orchestration, tool use, workflow automation, and guardrail design.

LangGraphOpenAI Agents SDKMCP Protocol (SSE / Stdio / HTTP)Function CallingReActMulti-agent orchestrationDifyCozeN8NGuardRails

RAG & Knowledge Systems

Retrieval, knowledge organization, query transformation, and context engineering across document AI systems.

GraphRAGMilvusChromaDBFaissBGE / M3 embeddingsQuery TransformationRerankingMem0DSPy Context EngineeringRAGFlow

Machine Learning & Multimodal

A combined view of classical ML, deep learning, and multimodal modeling that matches an applied-AI profile.

PyTorchCNN ArchitecturesLSTM / GRU / InformerXGBoost / LightGBM / CatBoostOptunaFeature EngineeringModel FusionTransfer LearningCLIPVision Transformer (ViT)LLaVASwin TransformerOpenCVImage Augmentation

Optimization, Infra & MLOps

Distributed training, inference optimization, service APIs, and deployment-minded engineering support.

DeepSpeed (ZeRO 1 / 2 / 3)DDP / FSDPTensor / Pipeline ParallelismMixed Precision (fp16 / bf16 / fp8)Megatron-LMTensorRTQuantization (GPTQ / AWQ / GGUF)NCCLFastAPIDocker / KubernetesLangSmithWandbPydanticSQLAlchemy / AlembicMongoDBGraphQL / RESTful API

Causal Inference & Analytics

The strongest differentiator for showing that you can measure impact, not just build models or workflows.

Differentiator
A/B TestingPSMDIDDMLDAGs / do-calculusIV / 2SLSSensitivity AnalysisRCT DesignSQL (Window Functions / CTEs / Joins)PandasNumPyTableauRFM / AARRR / Funnel / Cohort AnalysisBusiness Metrics

Full-stack AI platforms, document intelligence systems, and model-tuning workflows

Multi-Model AI Studio
PlatformFeatured

Multi-Model AI Studio

A full-stack AI workspace that unifies hosted and self-hosted LLMs with chat, streaming responses, multimodal input, and batch inference.

ReactTypeScriptFastAPISSELLM Platform
Multimodal Document RAG Platform
Document AIFeatured

Multimodal Document RAG Platform

A multimodal RAG system for PDF parsing, vector retrieval, and document-grounded chat, built as one integrated upload-to-answer experience.

ReactFastAPILangChainMilvusRAG
CLIP Cross-Modal Retrieval RAG
Document AI

CLIP Cross-Modal Retrieval RAG

CLIP encodes text and images into one 512-dim space, so text→image / image→image works. On LlamaIndex, from a CLIP MVP to VLM captioning + BM25 hybrid retrieval (RRF), persisted in Milvus.

CLIPLlamaIndexMultimodalMilvusRRF
Structured Extraction and Retrieval QA Platform
Document AI

Structured Extraction and Retrieval QA Platform

A document intelligence platform that combines structured extraction, vector search, and grounded QA across radiology, medication, finance, and news workflows.

FastAPIQdrantChromaLangChainDeepSeek
Agentic GraphRAG (Vertical Domain)
AgentFeatured

Agentic GraphRAG (Vertical Domain)

No Neo4j: LangExtract builds a Python-dict knowledge graph of entities + relations alongside a Chroma store, and a 3-tool agent picks vector / graph / hybrid retrieval with multi-hop traversal. Extractions carry char_interval for traceability.

LangExtractGraphRAGLangChainDeepSeekKnowledge Graph
Enterprise NL2SQL Fine-Tuning System
Model TuningFeatured

Enterprise NL2SQL Fine-Tuning System

An enterprise NL2SQL pipeline that generates schema-aware training data, then supports tuning, validation, and evaluation for natural-language SQL workflows.

LoRAQLoRAFastAPIWebSocketSQL
NL2SQL Data-Analysis Agent
Agent

NL2SQL Data-Analysis Agent

A Vanna-forked ReAct agent that turns a one-sentence question into SQL, runs it on MySQL, and returns a table + chart + explanation. Accuracy comes from RAG over three Milvus collections (DDL / business docs / historical SQL).

LangChainVannaJina EmbeddingsMilvusNL2SQL
RL-Tuned Function-Calling Agent Pipeline
Agent

RL-Tuned Function-Calling Agent Pipeline

A function-calling agent pipeline for preference data generation and evaluation, designed to improve tool selection and argument quality.

DPOFunction CallingEvaluationFastAPIAgents
Qwen3-VL Visual RL with Unsloth + GSPO
Reinforcement LearningFeatured

Qwen3-VL Visual RL with Unsloth + GSPO

Single-GPU visual RL: Unsloth + GSPO fine-tunes Qwen3-VL 8B on MathVista visual math, lifting output-format compliance from 77% to 84% with format + correctness rewards.

Qwen3-VLGSPOUnslothLoRATRLMathVista
GRPO Reasoning Trainer (GSM8K · Qwen2.5-0.5B)
Reinforcement LearningFeatured

GRPO Reasoning Trainer (GSM8K · Qwen2.5-0.5B)

Reproducing DeepSeek-R1's GRPO with TRL's GRPOTrainer on Qwen2.5-0.5B: five verifiable rewards teach the model to emit a reasoning chain before its answer on GSM8K. Runs on one GPU.

GRPOTRLDeepSeek-R1GSM8KQwen2.5
veRL PPO Training
Reinforcement Learning

veRL PPO Training

Classic PPO on a single GPU with ByteDance's veRL (HybridFlow): four models (Actor/Critic/Reference/Reward) + Ray, on GSM8K with a rule reward (regex ####). Paired with a close reading of InstructGPT's three-stage RLHF.

veRLPPORLHFRayvLLM
Train LLaMA from Scratch
Model TuningFeatured

Train LLaMA from Scratch

No API, no pretrained weights: rebuild LLaMA's decoder-only architecture (RMSNorm / RoPE / GQA / SwiGLU / KV cache) from scratch in PyTorch and train it. The foundation under everything else.

LLaMATransformerRoPERMSNormPyTorch
AI Document Review Agent v2.0
AgentFeatured

AI Document Review Agent v2.0

Full-stack document review: MinerU parses the PDF, a LangChain v1.1 + DeepSeek pipeline flags grammar issues and over-definitive language, streams each onto the PDF at its bounding box, with custom rules and human-in-the-loop review.

LangChainFastAPIReactDeepSeekMinerUSSE
OpenClaw Skill Development
Developer Tools

OpenClaw Skill Development

A practical study of OpenClaw's Skill system (teach an agent via SKILL.md, not code plugins), a complete Daily Briefing skill built from scratch, and a Lobster workflow chaining search → summarize → approve → push.

OpenClawAgent SkillsSKILL.mdLobsterBash
Harness Engineering in Practice
Developer ToolsFeatured

Harness Engineering in Practice

Output quality = model capability × design level. The four pillars of engineering an agent runtime (codebase-as-truth / mechanized constraints / feedback loops / entropy mgmt). Measured: model unchanged, the Harness alone lifts Terminal Bench 52.8% → 66.5%.

Harness EngineeringClaude CodeHooksAgent Runtime
Agent Long/Short-Term Memory System
Agent

Agent Long/Short-Term Memory System

Short-term SessionManager (truncation MAX_HISTORY=20 + rolling summary) + long-term MEMORY.md (flips to RAG past 2000 tokens), unified by a MemoryManager hub; production swaps in mem0 (LLM judge ADD/UPDATE/DELETE/NONE) + Milvus.

mem0MilvusLangChainLlamaIndexMemory
Context Engineering Middleware
Agent

Context Engineering Middleware

Fighting Context Rot: six context modules × five strategies (Write/Select/Compress/Isolate/Cache), all realized as stackable LangChain middleware, applied Cache-first (90% off) then Isolate.

Context EngineeringLangChainMiddlewarePrompt Cache
OpenClaw Multi-Agent Orchestration
Agent

OpenClaw Multi-Agent Orchestration

Multi-agent reduced to three MCP primitives (spawn/send/history), with six modes on top (Hub-Spoke/Pipeline/Hierarchical/Routing/P2P/Fleet). Understanding Hub one-directional dispatch, the subagent-layer sessions_send ban, and why P2P has zero production cases.

OpenClawMulti-AgentMCPHub-SpokeOrchestration
Enterprise Deep Research Agent (Dify)
AgentFeatured

Enterprise Deep Research Agent (Dify)

A Deep Research system as a Dify workflow: intent gate → topic decomposition → a ReAct agent iteratively searches/extracts evidence via Tavily → DeepSeek/Qwen writes a footnote-cited Markdown report.

DifyDeepSeekQwenTavilyReActWorkflow
Dify Long-Form Content Agent
Agent

Dify Long-Form Content Agent

A Dify advanced-chat workflow that breaks long-form writing into a controllable iterative loop: outline → section-by-section expansion → style-checker tool.

DifyDeepSeekWorkflowIteration
End-to-End Data Analysis Agent (DeepSeek-OCR + vLLM)
Document AI

End-to-End Data Analysis Agent (DeepSeek-OCR + vLLM)

Drop in a PDF / table image: DeepSeek-OCR parses structured data, FastAPI runs analysis, charts auto-render. OCR / analysis / viz are three swappable layers.

DeepSeek-OCRvLLMData AnalysisVisualizationFastAPI
Multimodal Fine-Tuning for Chinese Chart VQA
Model Tuning

Multimodal Fine-Tuning for Chinese Chart VQA

Fine-tuning a general VLM into a Chinese chart-VQA specialist using LlamaFactory and a zh-train chart dataset. The data-generation tool is a companion React + FastAPI project.

MultimodalLlamaFactoryQwen-VLChart VQAFine-tuning
Multimodal Vision LLM (PandaGPT)
Model TuningFeatured

Multimodal Vision LLM (PandaGPT)

ImageBind binds 6 modalities into one space, a single linear projection feeds Vicuna — PandaGPT trains only on image-text yet emergently understands audio/depth. Plus VPT visual-prompt tuning for pathology downstream transfer.

ImageBindPandaGPTVicunaMultimodalVPT
Coze Multimodal Video Generation Agent
Agent

Coze Multimodal Video Generation Agent

An end-to-end short-video pipeline on Coze (ByteDance Kouzi) — text → image → video — with 5 interconnected workflows, each shipping as its own workflow zip.

CozeWorkflowImage GenerationVideo GenerationMultimodal
TensorRT Inference Optimization
Model Tuning

TensorRT Inference Optimization

Shipping a trained model to the edge: ONNX → TensorRT engine build → layer/tensor fusion (Conv+BN+ReLU collapses into one CBR kernel) → INT8/FP16 PTQ calibration → a custom NMS plugin (IPluginV2) → SSD object-detection inference. The senior MLSys piece the portfolio lacks.

TensorRTINT8Layer FusionCUDA PluginONNX
YOLOv12 Steel Surface Defect Detection
Model Tuning

YOLOv12 Steel Surface Defect Detection

An Ultralytics YOLOv12 detector trained on NEU-DET: 6 defect classes, ~5000 images, full train → val → predict pipeline for automated steel quality inspection. A reproducible recipe + an illustrative inference demo.

YOLOv12UltralyticsObject DetectionNEU-DETIndustrial CV
AI Analyst — an LLM that builds its own models
Agent

AI Analyst — an LLM that builds its own models

An LLM acting as an analyst: it orchestrates tools via Function-Calling — Text2SQL (create_sql_agent) pulls features from MySQL, then it fits interpretable models on the fly (linear regression to decompose spend + a decision tree to find drivers) and returns an actionable recommendation. The net-new angle is an LLM that builds its own models, not NL→SQL→chart.

Function CallingText2SQLLangChainscikit-learnDeepSeek
PF-Net 3D Point-Cloud Completion
Model Tuning

PF-Net 3D Point-Cloud Completion

A different data modality: 3D unordered point sets. GAN-based completion with PF-Net (Point Fractal Network) — on ShapeNet-Part, a self-supervised 512-point crop as GT, a multi-scale FPS encoder (1920-d) + residual pyramid decoder fill the hole coarse(64)→center2(128)→fine(512), constrained by Chamfer Distance + an adversarial loss.

PF-NetPoint CloudGANChamfer DistancePointNet
Cross-Platform Spatial Interaction Layer (Quest + Vision Pro)
Spatial Computing

Cross-Platform Spatial Interaction Layer (Quest + Vision Pro)

A study-derived case from the SpatialXR Unity video courses: OpenXR at the base, a forking device layer (Meta XR SDK vs PolySpatial/Metal), and a constant XR Interaction Toolkit on top. One chain: hand-skeleton → pinch → ray → grab → poke World-Space UI. A companion runnable Unity project is open-sourced.

OpenXRXR Interaction ToolkitMeta XR SDKPolySpatialVision Pro
Colocated Large-Space Multiplayer MR
Spatial Computing

Colocated Large-Space Multiplayer MR

A study-derived case from the SpatialXR video courses: the core hard problem of colocated multiplayer MR — headsets' local frames converge to ONE shared origin via spatial anchors + alignment, plus player/object state sync over a public-internet relay, targeting Pico large-space. The Netcode SDK is unverified (video-only). Not a shipped Unity app.

Spatial AnchorsColocationSpatial AlignmentPicoMultiplayer

Academic research and technical exploration

Efficient Video Generation with Diffusion Models

CVPR 2026

Your Name, et al.

A novel efficient video diffusion architecture that significantly reduces computational cost while maintaining generation quality.

Video GenerationDiffusion ModelEfficiency

A Unified Framework for Multimodal Temporal Understanding

NeurIPS 2025

Your Name, et al.

A unified multimodal temporal understanding framework integrating visual, language, and audio signals for temporal reasoning.

MultimodalTemporalUnderstanding

Technical insights and reflections

AI Short Films · Cinematic Driving · Visual Stories

AI Film

AI-Generated Cyber City

A cyberpunk city short film generated with Sora and Runway

Driving

Mountain Road Sunset Drive

4K cinematic driving footage capturing sunset on mountain roads

AI Film

AI × Traditional Animation

A traditional Chinese animation short made with AI tools

Driving

City Night Cruise

Night driving through the city with neon lights and traffic

Quick answers to a few common questions

I'm open to AI Engineer and Machine Learning Engineer roles, focused on video generation, image generation, and multimodal systems. Full-time positions or high-impact contract work are both welcome.

Let's connect

Sent securely via Web3Forms — I won't share your details.