muapi-nano-banana

Reasoning-driven image generation using structured creative briefs (Gemini 3 style) — generates high-fidelity images via muapi.ai with logic-based prompting

INSTALLATION
npx skills add https://github.com/samuraigpt/generative-media-skills --skill muapi-nano-banana
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

🍌 Nano-Banana Expert Skill (Gemini 3 Style)

A specialized skill for AI Agents to leverage "Reasoning-Driven" image generation.

Based on the advanced prompting architecture of Google's Gemini 3 (Nano Banana Pro), this skill moves beyond keyword stuffing to structured, logic-based creative briefs.

Core Competencies

  • Reasoning-Driven Prompting: Using natural language logic to define physics, lighting, and spatial relationships.
  • Structured Creative Briefs: Implementing the "Perfect Prompt" formula: Subject + Action + Context + Composition + Lighting.
  • Text Rendering Precision: Explicitly defining typography and signifiers for legible text integration.
  • Contextual Grounding: Using "Search Grounding" logic (simulated) to anchor generations in real-world accuracy.

🏗️ Technical Specification

1. The "Perfect Prompt" Formula

Component

Description

Example

Subject

Detailed entity description

"A stoic robot barista with exposed copper wiring"

Action

Dynamic interaction

"Pouring a latte art leaf with mechanical precision"

Context

Environment & Atmosphere

"Inside a neon-lit cyberpunk cafe at midnight"

Composition

Camera & Lens choice

"Close-up, 85mm lens, f/1.8 aperture"

Lighting

Mood & Direction

"Volumetric blue rim light, warm cafe glow"

Style

Aesthetic anchor

"Cinematic, photorealistic, 4K production value"

2. Advanced Features

  • Negative Constraint Logic: Instead of "no blurry," use "Ensure sharp focus on the subject's eyes."
  • Identity Consistency: (Simulated) "Maintain consistent facial structure across variations."
  • Text Integration: Use double quotes for specific text: The sign reads "OPEN 24/7".

🧠 Prompt Optimization Protocol (Agent Instruction)

Before calling the script, the Agent MUST rewrite the user's prompt into a logic-driven Reasoning Brief:

  • NO KEYWORD SOUP: Remove "8k, masterpiece, ultra-detailed." Use full, descriptive sentences.
  • PHYSICAL CONSISTENCY: Describe how elements interact (e.g., "The light from the crystal shards casts caustic patterns across the obsidian floor").
  • TEXT PRECISION: If the user wants text, define it precisely: featuring a sign that says "STORE NAME" in a weathered serif font.
  • OPTICAL DIRECTIVES: Specify lens behavior: Shallow Depth of Field (f/1.8), Macro Lens, Anamorphic Flare.

🚀 Protocol: Using Nano-Banana

Step 1: Define the Creative Logic

Provide the agent with a subject and a specific scenario.

Step 2: Invoke the Script

The generate-nano-art.sh script translates the logic into a structured Gemini 3-style prompt.

# Generating a reasoning-driven image

bash scripts/generate-nano-art.sh \

  --subject "a glass chess piece" \

  --action "shattering into liquid shards" \

  --context "on a obsidian table" \

  --style "macro photography"

⚠️ Constraints & Guardrails

  • No Keyword Soup: MANDATORY - Do not use "trending on artstation, masterpiece, 8k". Use natural language descriptions.
  • Physics Logic: Ensure the prompt describes physically possible lighting and reflection interactions.
  • Full Sentences: The model parses relationships; use "light reflecting off the water" instead of "water, reflection".

⚙️ Implementation Details

This skill applies a "Logic Wrapper" around the core/media/generate-image.sh primitive, converting fragmented inputs into a coherent, reasoning-ready narrative prompt.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card