SKILL.md
$27
Region is auto-detected. Override with --region global or --region cn.
Agent Flags
Always use these flags in non-interactive (agent/CI) contexts:
Flag
Purpose
--non-interactive
Fail fast on missing args instead of prompting
--quiet
Suppress spinners/progress; stdout is pure data
--output json
Machine-readable JSON output
--async
Return task ID immediately (video generation)
--dry-run
Preview the API request without executing
--yes
Skip confirmation prompts
Commands
text chat
Chat completion. Default model: MiniMax-M2.7.
mmx text chat --message <text> [flags]
Flag
Type
Description
--message <text>
string, required, repeatable
Message text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello")
--messages-file <path>
string
JSON file with messages array. Use - for stdin
--system <text>
string
System prompt
--model <model>
string
Model ID (default: MiniMax-M2.7)
--max-tokens <n>
number
Max tokens (default: 4096)
--temperature <n>
number
Sampling temperature (0.0, 1.0]
--top-p <n>
number
Nucleus sampling threshold
--stream
boolean
Stream tokens (default: on in TTY)
--tool <json-or-path>
string, repeatable
Tool definition JSON or file path
# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet
# Multi-turn
mmx text chat \
--system "You are a coding assistant." \
--message "user:Write fizzbuzz in Python" \
--output json
# From file
cat conversation.json | mmx text chat --messages-file - --output json
stdout: response text (text mode) or full response object (json mode).
image generate
Generate images. Model: image-01.
mmx image generate --prompt <text> [flags]
Flag
Type
Description
--prompt <text>
string, required
Image description
--aspect-ratio <ratio>
string
e.g. 16:9, 1:1. Ignored if --width and --height are both set
--n <count>
number
Number of images (default: 1)
--seed <n>
number
Random seed for reproducible generation
--width <px>
number
Width in pixels (512–2048, multiple of 8). Requires --height
--height <px>
number
Height in pixels (512–2048, multiple of 8). Requires --width
--prompt-optimizer
boolean
Optimize prompt before generation
--aigc-watermark
boolean
Embed AI-generated content watermark
--subject-ref <params>
string
Subject reference: type=character,image=path-or-url
--response-format <format>
string
url (default) or base64. Base64 bypasses CDN download
--out-dir <dir>
string
Download images to directory
--out-prefix <prefix>
string
Filename prefix (default: image)
mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)
mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)
video generate
Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.
mmx video generate --prompt <text> [flags]
Flag
Type
Description
--prompt <text>
string, required
Video description
--model <model>
string
MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast
--first-frame <path-or-url>
string
First frame image
--callback-url <url>
string
Webhook URL for completion
--download <path>
string
Save video to specific file
--async
boolean
Return task ID immediately
--no-wait
boolean
Same as --async
--poll-interval <seconds>
number
Polling interval (default: 5)
# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}
# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4
video task get
Query status of a video generation task.
mmx video task get --task-id <id> [--output json]
video download
Download a completed video by task ID.
mmx video download --file-id <id> [--out <path>]
speech synthesize
Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.
mmx speech synthesize --text <text> [flags]
Flag
Type
Description
--text <text>
string
Text to synthesize
--text-file <path>
string
Read text from file. Use - for stdin
--model <model>
string
speech-2.8-hd (default), speech-2.6, speech-02
--voice <id>
string
Voice ID (default: English_expressive_narrator)
--speed <n>
number
Speed multiplier
--volume <n>
number
Volume level
--pitch <n>
number
Pitch adjustment
--format <fmt>
string
Audio format (default: mp3)
--sample-rate <hz>
number
Sample rate (default: 32000)
--bitrate <bps>
number
Bitrate (default: 128000)
--channels <n>
number
Audio channels (default: 1)
--language <code>
string
Language boost
--subtitles
boolean
Download and save subtitles as .srt file (alongside --out audio file). API must support subtitles for the selected model.
--pronunciation <from/to>
string, repeatable
Custom pronunciation
--sound-effect <effect>
string
Add sound effect
--out <path>
string
Save audio to file
--stream
boolean
Stream raw audio to stdout
mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3
mmx speech synthesize --text "Hello" --subtitles --out hello.mp3
# saves hello.mp3 + hello.srt (SRT subtitle file)
echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3
music generate
Generate music. Responds well to rich, structured descriptions.
Model: music-2.6-free — unlimited for API key users, RPM = 3.
mmx music generate --prompt <text> [--lyrics <text>] [flags]
Flag
Type
Description
--prompt <text>
string
Music style description (can be detailed)
--lyrics <text>
string
Song lyrics with structure tags. Required unless --instrumental or --lyrics-optimizer is used.
--lyrics-file <path>
string
Read lyrics from file. Use - for stdin
--lyrics-optimizer
boolean
Auto-generate lyrics from prompt. Cannot be used with --lyrics or --instrumental.
--instrumental
boolean
Generate instrumental music (no vocals). Cannot be used with --lyrics.
--vocals <text>
string
Vocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies"
--genre <text>
string
Music genre, e.g. folk, pop, jazz
--mood <text>
string
Mood or emotion, e.g. warm, melancholic, uplifting
--instruments <text>
string
Instruments to feature, e.g. "acoustic guitar, piano"
--tempo <text>
string
Tempo description, e.g. fast, slow, moderate
--bpm <number>
number
Exact tempo in beats per minute
--key <text>
string
Musical key, e.g. C major, A minor, G sharp
--avoid <text>
string
Elements to avoid in the generated music
--use-case <text>
string
Use case context, e.g. "background music for video", "theme song"
--structure <text>
string
Song structure, e.g. "verse-chorus-verse-bridge-chorus"
--references <text>
string
Reference tracks or artists, e.g. "similar to Ed Sheeran"
--extra <text>
string
Additional fine-grained requirements
--aigc-watermark
boolean
Embed AI-generated content watermark
--format <fmt>
string
Audio format (default: mp3)
--sample-rate <hz>
number
Sample rate (default: 44100)
--bitrate <bps>
number
Bitrate (default: 256000)
--out <path>
string
Save audio to file
--stream
boolean
Stream raw audio to stdout
At least one of --prompt or --lyrics is required.
# With lyrics
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet
# Auto-generate lyrics from prompt
mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out summer.mp3 --quiet
# Instrumental
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet
# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
--vocals "male and female duet, harmonies in chorus" \
--instruments "acoustic guitar, piano" \
--bpm 95 \
--lyrics-file song.txt \
--out duet.mp3
music cover
Generate a cover version of a song based on reference audio.
Model: music-cover-free — unlimited for API key users, RPM = 3.
mmx music cover --prompt <text> (--audio <url> | --audio-file <path>) [flags]
Flag
Type
Description
--prompt <text>
string, required
Target cover style, e.g. "Indie folk, acoustic guitar, warm male vocal"
--audio <url>
string
URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB)
--audio-file <path>
string
Local reference audio file (auto base64-encoded)
--lyrics <text>
string
Cover lyrics. If omitted, extracted from reference audio via ASR.
--lyrics-file <path>
string
Read lyrics from file. Use - for stdin
--seed <number>
number
Random seed 0–1000000 for reproducible results
--format <fmt>
string
Audio format: mp3, wav, pcm (default: mp3)
--sample-rate <hz>
number
Sample rate (default: 44100)
--bitrate <bps>
number
Bitrate (default: 256000)
--channel <n>
number
Channels: 1 (mono) or 2 (stereo, default)
--out <path>
string
Save audio to file
--stream
boolean
Stream raw audio to stdout
# Cover from URL
mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \
--audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet
# Cover from local file with custom lyrics
mmx music cover --prompt "Jazz, piano, slow" \
--audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet
# Reproducible result with seed
mmx music cover --prompt "Pop, upbeat" --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --seed 42 --out cover.mp3
vision describe
Image understanding via VLM. Provide either --image or --file-id, not both.
mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]
Flag
Type
Description
--image <path-or-url>
string
Local path or URL (auto base64-encoded)
--file-id <id>
string
Pre-uploaded file ID (skips base64)
--prompt <text>
string
Question about the image (default: "Describe the image.")
mmx vision describe --image photo.jpg --prompt "What breed?" --output json
stdout: description text (text mode) or full response (json mode).
search query
Web search via MiniMax.
mmx search query --q <query>
Flag
Type
Description
--q <query>
string, required
Search query
mmx search query --q "MiniMax AI" --output json --quiet
quota show
Display Token Plan usage and remaining quotas.
mmx quota show [--output json]
Tool Schema Export
Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:
# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema
# Single command
mmx config export-schema --command "video generate"
Use this to dynamically register mmx commands as tools in your agent framework.
Exit Codes
Code
Meaning
0
Success
1
General error
2
Usage error (bad flags, missing args)
3
Authentication error
4
Quota exceeded
5
Timeout
10
Content filter triggered
Piping Patterns
# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'
# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null
# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet
# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4
Configuration Precedence
CLI flags → environment variables → ~/.mmx/config.json → defaults.
# Persistent config
mmx config set --key region --value cn
mmx config show
# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn
Default Model Configuration
Set per-modality defaults so you don't need --model every time:
# Set defaults
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed
mmx config set --key default-speech-model --value speech-2.8-hd
mmx config set --key default-video-model --value MiniMax-Hailuo-2.3
mmx config set --key default-music-model --value music-2.6
# Use without --model
mmx text chat --message "Hello"
mmx speech synthesize --text "Hello" --out hello.mp3
mmx video generate --prompt "Ocean waves"
mmx music generate --prompt "Upbeat pop" --instrumental
# --model still overrides per-call
mmx text chat --model MiniMax-M2.7 --message "Hello"
Resolution priority: --model flag > config default > hardcoded fallback.