gemini-image

Analyze images using Gemini's vision capabilities for OCR, UI analysis, and visual understanding. Supports PNG, JPEG, GIF, and WebP images including screenshots, diagrams, charts, and code snippets Built-in analysis templates for common tasks: text extraction, code recovery, UI/UX feedback, error diagnosis, and data extraction from charts Handles single and multiple image comparisons in a single request Requires Google Generative AI library and valid GEMINI_API_KEY environment variable

INSTALLATION
npx skills add https://github.com/johnlindquist/claude --skill gemini-image
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Gemini Image Analysis

Analyze images using Gemini Pro's vision capabilities.

Prerequisites

pip install google-generativeai

export GEMINI_API_KEY=your_api_key

CLI Reference

Basic Image Analysis

# Analyze an image

gemini -m pro -f /path/to/image.png "Describe this image in detail"

With specific question

gemini -m pro -f screenshot.png "What error message is shown?"

Multiple images

gemini -m pro -f image1.png -f image2.png "Compare these two images"

## Analysis Operations

### General Description

gemini -m pro -f image.png "Describe this image comprehensively:

  1. Main subject/content
  1. Colors and composition
  1. Text visible (if any)
  1. Context and purpose
  1. Notable details"
  2. 
    ### Extract Text (OCR)
    

gemini -m pro -f screenshot.png "Extract all text from this image.

Format as plain text, preserving layout where possible.

Include any text in buttons, labels, or UI elements."


### Code from Screenshot

gemini -m pro -f code-screenshot.png "Extract the code from this screenshot.

Provide as properly formatted code with correct indentation.

Note any parts that are unclear or partially visible."


### UI Analysis

gemini -m pro -f ui-screenshot.png "Analyze this UI:

  1. What application/website is this?
  1. What page/screen is shown?
  1. Main UI elements and their purpose
  1. User flow/actions available
  1. Any UX issues or suggestions"
  2. 
    ### Error Analysis
    

gemini -m pro -f error-screenshot.png "Analyze this error:

  1. What error is shown?
  1. What is the likely cause?
  1. How to fix it?
  1. Any related information visible?"
  2. 
    ### Diagram Understanding
    

gemini -m pro -f diagram.png "Explain this diagram:

  1. What type of diagram is this?
  1. Main components and their relationships
  1. Data/process flow
  1. Key takeaways"
  2. 
    ## Specific Use Cases
    
    ### Debug Screenshot
    

gemini -m pro -f debug-screen.png "I'm debugging an issue. From this screenshot:

  1. What is the current state?
  1. What errors or warnings are visible?
  1. What should I look at?
  1. Suggested next steps"
  2. 
    ### Compare Before/After
    

gemini -m pro -f before.png -f after.png "Compare these before and after images:

  1. What changed?
  1. Is this an improvement?
  1. Any issues in the 'after' version?
  1. Anything missing?"
  2. 
    ### Design Feedback
    

gemini -m pro -f design.png "Provide design feedback:

  1. Visual hierarchy
  1. Color usage
  1. Typography
  1. Spacing and alignment
  1. Accessibility concerns
  1. Suggestions for improvement"
  2. 
    ### Data Extraction
    

gemini -m pro -f chart.png "Extract data from this chart:

  1. Chart type
  1. Data series and values
  1. Axes labels and ranges
  1. Key trends or insights
  1. Output as structured data if possible"
  2. 
    ### Form Analysis
    

gemini -m pro -f form.png "Analyze this form:

  1. Form purpose
  1. Fields and their types
  1. Required vs optional
  1. Validation rules visible
  1. UX suggestions"
  2. 
    ## Workflow Patterns
    
    ### Screenshot to Issue
    

Capture screenshot (macOS)

screencapture -i /tmp/bug.png

Analyze and format as issue

gemini -m pro -f /tmp/bug.png "Create a bug report from this screenshot:

Summary

[One-line description]

Steps to Reproduce

[Inferred from screenshot]

Expected Behavior

[What should happen]

Actual Behavior

[What the screenshot shows]

Environment

[Any visible system info]"


### UI to Code

gemini -m pro -f ui-design.png "Generate React component code that recreates this UI:

  • Use Tailwind CSS for styling
  • Make it responsive
  • Include proper TypeScript types
  • Add appropriate accessibility attributes"
  • 
    ### Documentation
    

gemini -m pro -f app-screen.png "Write user documentation for this screen:

  • What this screen is for
  • How to use each feature
  • Common tasks
  • Tips and notes"
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card