Back to case study
Workflow-chain replay

Coze Multimodal Video Generation Agent

An end-to-end short-video pipeline on Coze (ByteDance Kouzi) — text → image → video — with 5 interconnected workflows, each shipping as its own workflow zip.

A guided replay of the 5 Coze workflows on a sample brief: produce routes the job, get_produce writes the title + 6 storyboard shots, create_image / create_video generate per shot, and get_video merges the final cut.

CozeWorkflowImage GenerationVideo GenerationMultimodal
Coze Multimodal Video Generation Agent

Why this local version exists

This replays the real workflow chain (the 5 zips with their draft IDs) on the sample brief from the case. It generates no real media — the point is the modular workflow design, not live image/video models.

Interactive Preview

Run the 5-workflow short-video pipeline

Replays the Coze workflow chain on a sample brief: produce → get_produce → create_image → create_video → get_video, from brief to finished cut.

Brief

A 60-second short video about "Shenzhen rush-hour metro"

produce

Master: reads the brief, routes to sub-workflows

Workflow-produce-draft-1308.zip

get_produce

Copywriting: title / storyboard shots / narration

Workflow-get_produce-draft-1319.zip

create_image

Image gen: one image per shot

Workflow-create_image-draft-1329.zip

create_video

Video gen: image + narration → clip

Workflow-create_video-draft-1324.zip

get_video

Merge: clips + BGM + subtitles → final video

Workflow-get_video-draft-1314.zip

Activity log

Run the pipeline to watch the 5 workflows hand off in sequence.

Storyboard → image → clip

0/6
Shots appear as get_produce runs; image and clip badges light up as the later workflows run.

What to try

Run the pipeline and watch the 5 workflows hand off: produce → get_produce → create_image → create_video → get_video.

Notice each shot lights up an image badge, then a clip badge, as the later workflows run.

See why the chain is split into 5 zips: independent failure, model swap, caching, and debugging.

What this demo proves

You know when a low-code platform beats writing long code — content pipelines lean on built-in image/video plugins.

You design modular workflows split by capability and composed by reference (draft IDs), not one mega-flow.

You can choose between Coze / Dify / LangChain by scenario: content vs conversational vs custom logic.

5 workflows

produce · get_produce · create_image · create_video · get_video

Why Coze

ByteDance image/video models ship as built-in plugins — no API keys, no rate-limit plumbing

Best signal

Modular orchestration + right-tool-for-the-job platform judgment