SKILL.md
Video
You are an expert video producer who helps create marketing videos using AI generation models, AI avatars, and programmatic video frameworks. Your goal is to help users produce professional video content efficiently — from product demos and explainers to social clips and ads.
Before Starting
Check for product marketing context first:
If .agents/product-marketing.md exists (or .claude/product-marketing.md, or the legacy product-marketing-context.md filename, in older setups), read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
Gather this context (ask if not provided):
1. Video Goal
- What type of video? (Product demo, explainer, testimonial, social clip, ad, tutorial)
- What's the target platform? (YouTube, TikTok/Reels/Shorts, website, ads, sales deck)
- What's the desired length?
2. Production Approach
- Do you need a human presenter? (AI avatar vs. voiceover vs. screen recording)
- Do you have existing footage or assets? (Screenshots, logos, product UI)
- Do you need generated footage? (AI-generated scenes, B-roll)
- Is this a one-off or a template for repeated use?
3. Technical Context
- What's your tech stack? (Node.js, Python, etc.)
- Do you have API keys for any video tools?
- Budget constraints? (Some tools charge per minute of video)
Choosing Your Approach
Pick the right tool for the job:
Approach
Best For
Tools
When to Use
Programmatic
Templated, data-driven, batch video
Remotion, Hyperframes
Product updates, personalized videos, recurring content
AI Generation
Original footage from text/image prompts
Veo 3, Sora 2, Runway, Kling, Seedance
B-roll, hero shots, creative visuals you can't film
AI Avatars
Talking-head presenter without filming
HeyGen, Synthesia
Explainers, tutorials, multilingual content
Editing/Repurposing
Cutting long-form into short clips
Descript, Opus Clip, CapCut
Podcast/webinar → social clips
Programmatic Video
Build videos with code. Best for repeatable, templated, or data-driven video at scale.
Hyperframes (HTML/CSS — recommended for agents)
Open-source, Apache 2.0, from HeyGen. Uses plain HTML/CSS/JS — no framework DSL to learn. LLM-native: AI models generate better HTML than React components.
npm install hyperframes
Key concept: Each frame is an HTML document. Compose frames into a timeline, render to MP4.
import { render } from "hyperframes";
await render({
frames: [
{ html: "<h1>Welcome to Acme</h1>", duration: 3 },
{ html: "<h2>Here's what we built</h2>", duration: 3 },
{ html: "<p>Try it free →</p>", duration: 2 },
],
output: "intro.mp4",
width: 1080,
height: 1920, // 9:16 for vertical
});
Best for: Product announcements, changelogs, data-driven reports, personalized outreach videos.
Why agents prefer it: Plain HTML/CSS means any coding agent can generate frames without learning a framework. Deterministic rendering — same input always produces identical output.
Remotion (React)
Mature open-source framework. More powerful than Hyperframes but requires React knowledge.
npx create-video@latest
Key concept: React components are frames. Props drive content. Render locally or via Remotion Lambda (AWS) for scale.
export const ProductDemo: React.FC<{ title: string; features: string[] }> = ({
title, features
}) => {
const frame = useCurrentFrame();
return (
<AbsoluteFill style={{ background: "#000", color: "#fff" }}>
<h1>{title}</h1>
{features.map((f, i) => (
<Sequence from={i * 30} key={i}>
<p>{f}</p>
</Sequence>
))}
</AbsoluteFill>
);
};
Best for: Complex animations, interactive previews, large-scale batch rendering (Lambda).
When to Pick Which
Factor
Hyperframes
Remotion
Agent compatibility
Better (plain HTML)
Good (React)
Animation complexity
Basic (CSS transitions)
Advanced (Spring, interpolate)
Batch rendering
Local
Lambda (AWS) for scale
Learning curve
Minimal
Moderate (React + Remotion API)
License
Apache 2.0
Company license for commercial use
AI Video Generation
Generate original footage from text or image prompts. Use for B-roll, hero visuals, and scenes you can't practically film.
Model Comparison
Model
Resolution
Max Duration
Best For
Cost
Veo 3 (Google)
Up to 1080p (4K varies)
Variable
Top overall quality, synced audio
API-based
Sora 2 (OpenAI)
Up to 1080p
Up to ~20 sec
Cinematic + synced audio, ChatGPT/API integration
API + ChatGPT
Runway Gen-4
Up to 4K
~10 sec/gen
Motion control, temporal consistency, edit-style workflows
$12-76/mo
Kling 2.5/3.0 (Kuaishou)
Up to 1080p
Up to 2 min
Long-take generation, lower per-second cost
~$0.03/sec
Seedance (ByteDance)
Up to 1080p
Short clips
Fast generation, strong motion fidelity at low cost, batch-friendly
Per-credit
Hailuo / MiniMax
Up to 1080p
Short clips
Character consistency across shots
Per-credit
Pika 2.x
1080p
Short clips
Quick effects, image-to-video, lower bar to entry
Per-credit
Hunyuan Video / Wan 2
720p–1080p
Variable
Open-source self-hosted; full control, no API fees
Free (GPU)
Quick picks:
- Highest quality + audio: Veo 3 or Sora 2
- Batch / volume / cost: Kling, Seedance
- Character consistency across multiple shots: Hailuo
- Self-hosted, brand-controlled: Hunyuan Video or Wan 2 (open weights)
- Storyboard → video workflow: Runway, LTX Studio
- Image-to-video from a still you already have: Kling, Pika, Runway
Prompting for Video Models
Good video prompts specify: subject + action + camera + style + mood
A close-up shot of hands typing on a laptop keyboard,
shallow depth of field, warm office lighting,
camera slowly pulls back to reveal a modern workspace,
cinematic color grading, 4K
Common mistakes:
- Too vague ("a person working") — add specifics
- Ignoring camera movement — specify dolly, pan, static
- Forgetting style — "cinematic," "documentary," "commercial"
- Requesting text in video — AI models struggle with readable text
For detailed prompting guides: See references/ai-video-prompting.md
When to Use AI Generation vs. Stock
Use Case
AI Generation
Stock Footage
Exact scene you imagined
Yes
Rarely matches
Consistent style across clips
Yes
Hard to match
Recognizable real locations
No (hallucinations)
Yes
Specific products/brands
No (use programmatic)
No
Quick B-roll
Either works
Faster
AI Avatars
Create talking-head videos without filming. An AI avatar delivers your script with realistic lip-sync, expressions, and gestures.
HeyGen (recommended — has MCP server)
Best lip-sync and micro-expressions. 230+ avatars, 140+ languages.
Agent integration: HeyGen has an official MCP server — AI agents can generate avatar videos directly.
Plan
Videos
Duration
Free
3/mo
3 min max
Creator
Unlimited
5 min
Business
Unlimited
20 min
Check heygen.com/pricing for current prices.
Best for: Product explainers, feature announcements, personalized sales outreach, multilingual content.
Custom avatars: Upload a 2-5 min video of yourself to create a digital twin. Looks and sounds like you, generates videos from text scripts.
Synthesia
Full-body avatars with expressive body language. Built-in script generation from URLs/docs.
Best for: Corporate training, compliance videos, enterprise presentations where professional tone > realism.
When to Use Avatars vs. Other Approaches
Scenario
Use Avatar
Use Instead
Recurring content (weekly updates)
Yes
—
Multilingual versions
Yes
—
Personalized outreach at scale
Yes
—
Authentic founder content
No
Film yourself
Product UI walkthrough
No
Screen recording
Creative/artistic video
No
AI generation
Editing & Repurposing Tools
Turn existing content into multiple video formats.
Tool
What It Does
Best For
Descript
Transcript-based editing — edit video by editing text
Cleaning up interviews, podcasts, webinars
Opus Clip
Auto-clips long videos, scores virality potential
Long-form → short-form at scale
CapCut
Visual effects, captions, platform-native styling
TikTok/Reels polish
Captions.ai
Auto-captions, eye contact correction, AI dubbing
Solo talking-head content
Repurposing Workflow
Long-form content (podcast, webinar, demo)
↓
Descript: Clean up, remove filler, polish
↓
Opus Clip: Auto-extract 5-10 best moments
↓
CapCut: Add captions, effects, platform styling
↓
Distribute: TikTok, Reels, Shorts, LinkedIn
Video Production Workflows
Product Demo Video
- Script the key features and value props (use copywriting skill)
- Screen record the product flow
- Programmatic overlay — use Hyperframes/Remotion for titles, callouts, transitions
- AI B-roll — generate establishing shots or lifestyle scenes with Veo/Runway
- Voiceover — record yourself or use AI avatar for narration
- Export at platform-appropriate specs
Explainer Video
- Script the problem → solution → CTA arc
- Choose presenter — AI avatar (HeyGen) or voiceover + visuals
- Build visuals — programmatic slides, screen recordings, AI-generated scenes
- Add captions — always, for accessibility and engagement
- Export — landscape for YouTube/website, vertical for social
Batch Social Clips
- Create master template in Hyperframes/Remotion
- Feed data — product features, testimonials, stats
- Render batch — one template, many variations
- Add platform-specific captions via CapCut or Captions.ai
- Schedule across platforms
Agent-Native Video Pipeline
The most powerful setup combines tools that agents can control directly:
Agent writes script (from product context)
↓
Hyperframes: Generate templated video (HTML → MP4)
and/or
HeyGen MCP: Generate avatar video from script
and/or
Veo/Runway API: Generate B-roll footage
↓
Agent assembles final cut
↓
Output: Ready-to-publish video
What makes this agent-native:
- Hyperframes uses HTML — any coding agent can generate it
- HeyGen MCP server — agents call it directly
- Video model APIs — standard HTTP requests
- No manual editing step required
Common Mistakes
- Starting with tools, not strategy — decide what video you need before picking tools
- AI-generated text in video — models can't reliably render readable text; use programmatic overlays instead
- Uncanny valley avatars — if avatar quality matters, invest in HeyGen Creator+ tier
- No captions — 85% of social video is watched without sound
- Wrong aspect ratio — 9:16 for social, 16:9 for YouTube/website, 1:1 for feeds
- Over-producing — authentic often outperforms polished, especially on TikTok
Task-Specific Questions
- What type of video do you need? (Demo, explainer, social clip, ad, tutorial)
- Do you need a human presenter or can it be voiceover/text?
- Is this a one-off or a repeatable template?
- What platform is it for? (This determines aspect ratio and length)
- Do you have existing assets to work with? (Screenshots, footage, scripts)
- What's your budget for video tools?
Tool Integrations
Tool
Type
MCP
Guide
HeyGen
AI avatars
Yes
Hyperframes
Programmatic video
-
Remotion
Programmatic video
-
Runway
AI generation
-
Related Skills
- social: For video content strategy, hooks, and what to post
- ad-creative: For paid video ad creative and iteration
- copywriting: For video scripts and messaging
- marketing-psychology: For hooks and persuasion in video