Coze Multimodal Video Generation Agent
An end-to-end short-video pipeline on Coze (ByteDance Kouzi) — text → image → video — with 5 interconnected workflows, each shipping as its own workflow zip.
A guided replay of the 5 Coze workflows on a sample brief: produce routes the job, get_produce writes the title + 6 storyboard shots, create_image / create_video generate per shot, and get_video merges the final cut.
Why this local version exists
This replays the real workflow chain (the 5 zips with their draft IDs) on the sample brief from the case. It generates no real media — the point is the modular workflow design, not live image/video models.
Run the 5-workflow short-video pipeline
Replays the Coze workflow chain on a sample brief: produce → get_produce → create_image → create_video → get_video, from brief to finished cut.
Brief
A 60-second short video about "Shenzhen rush-hour metro"
produce
Master: reads the brief, routes to sub-workflows
Workflow-produce-draft-1308.zip
get_produce
Copywriting: title / storyboard shots / narration
Workflow-get_produce-draft-1319.zip
create_image
Image gen: one image per shot
Workflow-create_image-draft-1329.zip
create_video
Video gen: image + narration → clip
Workflow-create_video-draft-1324.zip
get_video
Merge: clips + BGM + subtitles → final video
Workflow-get_video-draft-1314.zip
Activity log
Run the pipeline to watch the 5 workflows hand off in sequence.
Storyboard → image → clip
What to try
Run the pipeline and watch the 5 workflows hand off: produce → get_produce → create_image → create_video → get_video.
Notice each shot lights up an image badge, then a clip badge, as the later workflows run.
See why the chain is split into 5 zips: independent failure, model swap, caching, and debugging.
What this demo proves
You know when a low-code platform beats writing long code — content pipelines lean on built-in image/video plugins.
You design modular workflows split by capability and composed by reference (draft IDs), not one mega-flow.
You can choose between Coze / Dify / LangChain by scenario: content vs conversational vs custom logic.
5 workflows
produce · get_produce · create_image · create_video · get_video
Why Coze
ByteDance image/video models ship as built-in plugins — no API keys, no rate-limit plumbing
Best signal
Modular orchestration + right-tool-for-the-job platform judgment