RL-Tuned Function-Calling Agent Pipeline

This project works especially well as a compact but high-signal portfolio item because it shows advanced agent quality thinking: traces, preference pairs, and evaluation criteria tied to decision quality.

Overview

Function-calling agents are not only judged by the text they generate. They are judged by whether they choose the right tool, pass the right arguments, and take efficient action sequences.

This project was designed around that idea. Instead of stopping at supervised examples, it builds a data-generation and evaluation loop for agent optimization:

collect multi-turn traces from tool-using runs
construct chosen and rejected preference pairs
export DPO-ready datasets
compare base and tuned models on tool-use behavior

Why This Is a Valuable Portfolio Project

Many AI portfolios show agents that can call tools. Far fewer show a workflow for measuring and improving how those agents make decisions.

That is what makes this project useful in interviews. It signals that I am thinking about agent quality as a systems problem, not just a prompt-design problem.

Pipeline Design

The system is organized around modular stages:

task generation for agent scenarios
trace collection from tool-using runs
validation and logging
chosen / rejected pair construction
evaluation of tool-call correctness and argument quality

FastAPI and WebSocket-based progress reporting make the workflow easier to operate than a purely offline script bundle.

Best Demo Format

The strongest public demo for this project would be:

a small set of sample tasks
replay views of tool traces
side-by-side comparison of weaker vs. stronger agent behavior
simple metrics or judge summaries for tool-call quality

That gives visitors a concrete way to understand what “agent tuning” means in practice.

Demo strategy

Live demo · AutoToolDPO real chosen/rejected samples

The interactive demo shows the project's real generated dataset verbatim: 3 DPO samples taken from cells 47 and 68 of 案例9 notebook 企业级Agent Function-Calling RL微调.ipynb, with versioned tool names (@v1), <function_call> / <final> tag format, the 'why this is rejected' rationale per sample, and the 6-stage FastAPI + asyncio.Semaphore(10) backend pipeline.

Public preview can be enabled later without redesigning the case-study layout

What This Project Signals

advanced agent workflow understanding
evaluation-first thinking
data generation for preference optimization
applied AI engineering beyond standard prompt-and-demo projects