Back to projects
Coze Multimodal Video Generation Agent
Case Study

Coze Multimodal Video Generation Agent

An end-to-end short-video generation pipeline on Coze (ByteDance Kouzi), with 5 interconnected workflows — text → image → video, each shipping as its own workflow zip.

CozeWorkflowImage GenerationVideo GenerationMultimodal

Coze (扣子) is ByteDance's no-code agent platform. Long-form code is overkill for content pipelines like "turn a brief into a finished short video" — node-based workflow is. This project is a real, runnable 5-workflow chain that takes a topic brief and produces a finished video.

5 个 workflow 配合

工作流合集/
├── Workflow-produce-draft-1308.zip         # 顶层:根据 brief 决定走什么子流程
├── Workflow-get_produce-draft-1319.zip     # 文案生成:标题 / 分镜 / 旁白
├── Workflow-create_image-draft-1329.zip    # 图片生成:每个分镜对应一张图
├── Workflow-create_video-draft-1324.zip    # 视频生成:分镜图 + 旁白 → 视频片段
└── Workflow-get_video-draft-1314.zip       # 视频合并:所有片段拼接成最终视频

Each workflow is a Coze JSON/YAML export — importable into any other Coze workspace.

What this signals

  • You can build production content pipelines in no-code platforms when long-form code would be overkill
  • You understand modular workflow composition — split by capability, compose by ID reference
  • You can pick the right product format — Coze for content pipelines, Dify for chat, LangChain for custom code
Demo strategy

What the demo replays

The interactive demo replays the 5-workflow chain on the sample brief ('Shenzhen rush-hour metro, 60s'): produce routes → get_produce writes the title + 6 storyboard shots + narration → create_image generates per shot → create_video makes per-shot clips → get_video merges the final cut. The 5 .zip files (with their draft IDs) come from 案例10 工作流合集; no real image/video models run in the browser.

Public preview can be enabled later without redesigning the case-study layout