videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable…

INSTALLATION
npx skills add https://github.com/video-db/skills --skill videodb
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

VideoDB Skill

Perception + memory + actions for video, live streams, and desktop sessions.

Use this skill when you need to:

1) Desktop Perception

  • Start/stop a desktop session capturing screen, mic, and system audio
  • Stream live context and store episodic session memory
  • Run real-time alerts/triggers on what’s spoken and what's happening on screen
  • Produce session summaries, a searchable timeline, and playable evidence links

2) Video ingest + stream

  • Ingest a file or URL and return a playable web stream link
  • Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio

3) Index + search (timestamps + evidence)

  • Build visual, spoken, and keyword indexes
  • Search and return exact moments with timestamps and playable evidence
  • Auto-create clips from search results

4) Timeline editing + generation

  • Subtitles: generate, translate, burn-in
  • Overlays: text/image/branding, motion captions
  • Audio: background music, voiceover, dubbing
  • Programmatic composition and exports via timeline operations

5) Live streams (RTSP) + monitoring

  • Connect RTSP/live feeds
  • Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows

Common inputs

  • Local file path, public URL, or RTSP URL
  • Desktop capture request: start / stop / summarize session
  • Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules

Common outputs

  • Stream URL — make it playable: https://console.videodb.io/player?url={STREAM_URL}
  • Search results with timestamps and evidence links
  • Generated assets: subtitles, audio, images, clips
  • Event/alert payloads for live streams
  • Desktop session summaries and memory entries

Canonical prompts (examples)

  • “Start desktop capture and alert when a password field appears.”
  • “Record my session and produce an actionable summary when it ends.”
  • “Ingest this file and return a playable stream link.”
  • “Index this folder and find every scene with people, return timestamps.”
  • “Generate subtitles, burn them in, and add light background music.”
  • “Connect this RTSP URL and alert when a person enters the zone.”

Running Python code

CRITICAL: Always cd to the user's project directory before running Python code. This ensures load_dotenv(".env") finds the correct .env file.

from dotenv import load_dotenv

load_dotenv(".env")

import videodb

conn = videodb.connect()

This reads VIDEO_DB_API_KEY from:

  • Environment (if already exported)
  • Project's .env file in current directory

If the key is missing, videodb.connect() raises AuthenticationError automatically.

Do NOT write a script file when a short inline command works.

When writing inline Python (python -c "..."), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:

python << 'EOF'

from dotenv import load_dotenv

load_dotenv(".env")

import videodb

conn = videodb.connect()

coll = conn.get_collection()

print(f"Videos: {len(coll.get_videos())}")

EOF

Setup

When the user asks to "setup videodb" or similar:

1. Install SDK

pip install "videodb[capture]" python-dotenv

If videodb[capture] fails on Linux, install without the capture extra:

pip install videodb python-dotenv

2. Configure API key

The user must set VIDEO_DB_API_KEY using either method:

  • Export in terminal (recommended): export VIDEO_DB_API_KEY=your-key
  • **Project .env file**: Save VIDEO_DB_API_KEY=your-key in the project's .env file

Get a free API key at https://console.videodb.io (50 free uploads, no credit card).

Do NOT read, write, or handle the API key yourself. Always let the user set it.

Quick Reference

Upload media

# URL

video = coll.upload(url="https://example.com/video.mp4")

# YouTube

video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")

# Local file

video = coll.upload(file_path="/path/to/video.mp4")

Transcript + subtitle

# force=True skips the error if the video is already indexed

video.index_spoken_words(force=True)

text = video.get_transcript_text()

stream_url = video.add_subtitle()

Search inside videos

from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

# search() raises InvalidRequestError when no results are found.

# Always wrap in try/except and treat "No results found" as empty.

try:

    results = video.search("product demo")

    shots = results.get_shots()

    stream_url = results.compile()

except InvalidRequestError as e:

    if "No results found" in str(e):

        shots = []

    else:

        raise

Scene search

import re

from videodb import SearchType, IndexType, SceneExtractionType

from videodb.exceptions import InvalidRequestError

# index_scenes() has no force parameter — it raises an error if a scene

# index already exists. Extract the existing index ID from the error.

try:

    scene_index_id = video.index_scenes(

        extraction_type=SceneExtractionType.shot_based,

        prompt="Describe the visual content in this scene.",

    )

except Exception as e:

    match = re.search(r"id\s+([a-f0-9]+)", str(e))

    if match:

        scene_index_id = match.group(1)

    else:

        raise

# Use score_threshold to filter low-relevance noise (recommended: 0.3+)

try:

    results = video.search(

        query="person writing on a whiteboard",

        search_type=SearchType.semantic,

        index_type=IndexType.scene,

        scene_index_id=scene_index_id,

        score_threshold=0.3,

    )

    shots = results.get_shots()

    stream_url = results.compile()

except InvalidRequestError as e:

    if "No results found" in str(e):

        shots = []

    else:

        raise

Timeline editing

Use the Editor API to compose videos, images, audio, and text. See reference/editor.md for full workflow.

from videodb.editor import Timeline, Track, Clip, VideoAsset, ImageAsset, AudioAsset, Fit

timeline = Timeline(conn)

timeline.resolution = "1280x720"

video_track = Track()

video_track.add_clip(0, Clip(asset=VideoAsset(id=video.id, start=10), duration=20))

audio_track = Track()

audio_track.add_clip(0, Clip(asset=AudioAsset(id=music.id, volume=0.2), duration=20))

timeline.add_track(video_track)

timeline.add_track(audio_track)

stream_url = timeline.generate_stream()

Transcode video (resolution / quality change)

from videodb import TranscodeMode, VideoConfig, AudioConfig

# Change resolution, quality, or aspect ratio server-side

job_id = conn.transcode(

    source="https://example.com/video.mp4",

    callback_url="https://example.com/webhook",

    mode=TranscodeMode.economy,

    video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),

    audio_config=AudioConfig(mute=False),

)

Reframe aspect ratio (for social platforms)

Warning: reframe() is a slow server-side operation. For long videos it can take

several minutes and may time out. Best practices:

  • Always limit to a short segment using start/end when possible
  • For full-length videos, use callback_url for async processing
  • Trim the video on a Timeline first, then reframe the shorter result
from videodb import ReframeMode

# Always prefer reframing a short segment:

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

# Async reframe for full-length videos (returns None, result via webhook):

video.reframe(target="vertical", callback_url="https://example.com/webhook")

# Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)

reframed = video.reframe(start=0, end=60, target="square")

# Custom dimensions

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})

Generative media

image = coll.generate_image(

    prompt="a sunset over mountains",

    aspect_ratio="16:9",

)

Error handling

from videodb.exceptions import AuthenticationError, InvalidRequestError

try:

    conn = videodb.connect()

except AuthenticationError:

    print("Check your VIDEO_DB_API_KEY")

try:

    video = coll.upload(url="https://example.com/video.mp4")

except InvalidRequestError as e:

    print(f"Upload failed: {e}")

Common pitfalls

Scenario

Error message

Solution

Indexing an already-indexed video

Spoken word index for video already exists

Use video.index_spoken_words(force=True) to skip if already indexed

Scene index already exists

Scene index with id XXXX already exists

Extract the existing scene_index_id from the error with re.search(r"id\s+([a-f0-9]+)", str(e))

Search finds no matches

InvalidRequestError: No results found

Catch the exception and treat as empty results (shots = [])

Reframe times out

Blocks indefinitely on long videos

Use start/end to limit segment, or pass callback_url for async

Negative timestamps on Timeline

Silently produces broken stream

Always validate start >= 0 before creating VideoAsset

generate_video() / create_collection() fails

Operation not allowed or maximum limit

Plan-gated features — inform the user about plan limits

Additional docs

Reference documentation is in the reference/ directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.

  • reference/index.md - Scene indexing &#x26; extraction workflow (index, extract frames, read back, manage)

Screen Recording (Desktop Capture)

Use ws_listener.py to capture WebSocket events during recording sessions. Desktop capture supports macOS only.

Quick Start

  • Start listener: python scripts/ws_listener.py --cwd=<PROJECT_ROOT> &#x26;
  • Get WebSocket ID: cat /tmp/videodb_ws_id
  • Run capture code (see reference/capture.md for full workflow)
  • Events written to: /tmp/videodb_events.jsonl

Query Events

import json

events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]

# Get all transcripts

transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]

# Get visual descriptions from last 5 minutes

import time

cutoff = time.time() - 300

recent_visual = [e for e in events

                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]

Utility Scripts

For complete capture workflow, see reference/capture.md.

Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, volume control, fade transitions, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (speed changes, crop/zoom, colour grading, keyframe animation).

When to use what

Problem

VideoDB solution

Platform rejects video aspect ratio or resolution

video.reframe() or conn.transcode() with VideoConfig

Need to resize video for Twitter/Instagram/TikTok

video.reframe(target="vertical") or target="square"

Need to change resolution (e.g. 1080p → 720p)

conn.transcode() with VideoConfig(resolution=720)

Need to overlay audio/music on video

AudioAsset on an Editor Timeline with volume control

Need to add subtitles

video.add_subtitle() or CaptionAsset on Editor Timeline

Need to combine/trim clips

VideoAsset on an Editor Timeline

Need to compose images with voiceover

ImageAsset + AudioAsset on separate Editor tracks

Need to generate voiceover, music, or SFX

coll.generate_voice(), generate_music(), generate_sound_effect()

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card