Back to projects
End-to-End Data Analysis Agent (DeepSeek-OCR + vLLM)
Case Study

End-to-End Data Analysis Agent (DeepSeek-OCR + vLLM)

Drop in a PDF report or a table image: DeepSeek-OCR turns it into structured data, the FastAPI backend runs analysis, charts get auto-generated. OCR / analysis / visualization are three swappable layers.

DeepSeek-OCRvLLMData AnalysisVisualizationFastAPI

This is the "no more manual Excel" workflow: feed in a PDF or screenshot of tabular data, get back analysis and charts. DeepSeek-OCR-2 handles the parsing (much better than legacy OCR on financial / scientific tables), a FastAPI backend orchestrates analysis + visualization generation, and vLLM keeps the OCR model serving at production latency.

Three layers, swappable

backend/
├── core/
│   ├── ocr/             # DeepSeek-OCR-2 client + table-structure extraction
│   ├── analysis/        # pandas-style stats + LLM-driven summarization
│   └── visualization/   # matplotlib / plotly chart generation
├── services/
│   ├── ocr_service.py
│   ├── analysis_service.py
│   ├── visualization_service.py
│   └── integration_service.py    # top-level orchestrator
├── api/                  # FastAPI routes
└── main.py

Each layer is one service module + one core implementation. Want to swap DeepSeek-OCR for PaddleOCR-VL? Replace core/ocr/ and services/ocr_service.py keeps its interface. Want to add D3 charts? Drop another renderer into core/visualization/.

Why DeepSeek-OCR-2 specifically

Tables in real-world PDFs (财报, 论文, lab reports) break legacy OCR — merged cells, multi-line headers, footnotes mixed into rows. DeepSeek-OCR-2 ships with dedicated table-structure recognition that emits cells with row/col coordinates, not just text — making downstream pandas ingestion painless.

vLLM for serving

OCR runs in a real-time loop (user uploads → reads → analyzes → renders), so latency matters. vLLM gets the OCR model from ~3s/page (raw transformers) to ~0.6s/page on the same GPU through continuous batching.

What this signals

  • You design layered AI systems, not monoliths — each capability (OCR / analysis / viz) is one service + one core
  • You pick the right OCR for the job — table-structure-aware (DeepSeek-OCR-2) over plain text-only OCR
  • You care about inference latency — vLLM serving instead of raw transformers
Demo strategy

What the demo replays

The interactive demo replays the three-layer pipeline on a sample financial table: ocr_service parses cells with coordinates → analysis_service computes KPIs and flags anomalies → visualization_service lets the LLM pick the chart type, then renders it. It shows the real per-layer behavior of core/ocr · core/analysis · core/visualization — no live vLLM/LLM calls.

Public preview can be enabled later without redesigning the case-study layout