End-to-End Data Analysis Agent (DeepSeek-OCR + vLLM)
Drop in a PDF report or a table image: DeepSeek-OCR turns it into structured data, the FastAPI backend runs analysis, charts get auto-generated. OCR / analysis / visualization are three swappable layers.
This is the "no more manual Excel" workflow: feed in a PDF or screenshot of tabular data, get back analysis and charts. DeepSeek-OCR-2 handles the parsing (much better than legacy OCR on financial / scientific tables), a FastAPI backend orchestrates analysis + visualization generation, and vLLM keeps the OCR model serving at production latency.
Three layers, swappable
backend/
├── core/
│ ├── ocr/ # DeepSeek-OCR-2 client + table-structure extraction
│ ├── analysis/ # pandas-style stats + LLM-driven summarization
│ └── visualization/ # matplotlib / plotly chart generation
├── services/
│ ├── ocr_service.py
│ ├── analysis_service.py
│ ├── visualization_service.py
│ └── integration_service.py # top-level orchestrator
├── api/ # FastAPI routes
└── main.py
Each layer is one service module + one core implementation. Want to swap DeepSeek-OCR for
PaddleOCR-VL? Replace core/ocr/ and services/ocr_service.py keeps its interface. Want to add D3
charts? Drop another renderer into core/visualization/.
Why DeepSeek-OCR-2 specifically
Tables in real-world PDFs (财报, 论文, lab reports) break legacy OCR — merged cells, multi-line headers, footnotes mixed into rows. DeepSeek-OCR-2 ships with dedicated table-structure recognition that emits cells with row/col coordinates, not just text — making downstream pandas ingestion painless.
vLLM for serving
OCR runs in a real-time loop (user uploads → reads → analyzes → renders), so latency matters. vLLM gets the OCR model from ~3s/page (raw transformers) to ~0.6s/page on the same GPU through continuous batching.
What this signals
- You design layered AI systems, not monoliths — each capability (OCR / analysis / viz) is one service + one core
- You pick the right OCR for the job — table-structure-aware (DeepSeek-OCR-2) over plain text-only OCR
- You care about inference latency — vLLM serving instead of raw transformers
What the demo replays
The interactive demo replays the three-layer pipeline on a sample financial table: ocr_service parses cells with coordinates → analysis_service computes KPIs and flags anomalies → visualization_service lets the LLM pick the chart type, then renders it. It shows the real per-layer behavior of core/ocr · core/analysis · core/visualization — no live vLLM/LLM calls.