Enterprise NL2SQL Fine-Tuning System

This project broadens the portfolio beyond RAG and shows that the same engineering mindset can be applied to model adaptation, evaluation, and enterprise data workflows.

Overview

Generic text-to-SQL systems tend to break down in enterprise settings because they do not understand private schema design, organization-specific terminology, or production validation constraints.

This project addresses that gap by treating NL2SQL as a full pipeline problem:

generate training samples from database metadata
tune models with LoRA-style adaptation
validate SQL automatically
evaluate not just text similarity, but execution behavior

That makes the project much closer to a deployable enterprise workflow than a one-off fine-tuning experiment.

Product and System Shape

The system includes both model-related and operator-facing layers:

a data-generation application with configuration controls
schema-aware sample creation across database structures
progress tracking with FastAPI and WebSocket updates
LoRA and QLoRA training workflows
evaluation that includes execution-oriented checks

This matters because it shows that model tuning was treated as an engineering workflow, not only as notebook research.

Why This Project Stands Out

Among portfolio projects, this one is valuable because it signals:

enterprise workflow understanding
data-centric thinking
model adaptation experience beyond prompting
evaluation discipline tied to real task quality

For applied AI and AI engineer roles, that helps balance a portfolio that might otherwise look too RAG-heavy.

Deployment and Demo Strategy

The best way to present this project publicly is not to expose unrestricted training. A better demo format is:

show a sample schema
generate example NL2SQL pairs
run a small validation workflow
display evaluation summaries and a few query examples

That gives visitors something concrete to interact with while keeping the heavier training path private or offline.

Demo strategy

Live demo · faithful walkthrough of the real notebook

The interactive demo mirrors 案例7 notebook 企业私有化Nl2SQL模型微调实战.ipynb verbatim: the 6-step data_create pipeline, the actual llamafactory-cli train command (Qwen3-4B LoRA r=8 α=16), the real predict_with_generate eval table (BLEU-4 10.25 → 22.90, ROUGE-L 10.31 → 28.05), and the DeepSeek-Coder 6.7B branch metrics (91% syntax-valid, 61% execution-match).

Public preview can be enabled later without redesigning the case-study layout

What This Project Signals

schema-aware AI system design
full-stack support around model workflows
execution-aware evaluation mindset
practical adaptation of LLMs to enterprise data problems