gemini-image

Name: gemini-image
Author: johnlindquist

Analyze images using Gemini's vision capabilities for OCR, UI analysis, and visual understanding. Supports PNG, JPEG, GIF, and WebP images including screenshots, diagrams, charts, and code snippets Built-in analysis templates for common tasks: text extraction, code recovery, UI/UX feedback, error diagnosis, and data extraction from charts Handles single and multiple image comparisons in a single request Requires Google Generative AI library and valid GEMINI_API_KEY environment variable

INSTALLATION

npx skills add https://github.com/johnlindquist/claude --skill gemini-image

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Gemini Image Analysis

Analyze images using Gemini Pro's vision capabilities.

Prerequisites

pip install google-generativeai

export GEMINI_API_KEY=your_api_key

CLI Reference

Basic Image Analysis

# Analyze an image

gemini -m pro -f /path/to/image.png "Describe this image in detail"

With specific question

gemini -m pro -f screenshot.png "What error message is shown?"

Multiple images

gemini -m pro -f image1.png -f image2.png "Compare these two images"

## Analysis Operations

### General Description

gemini -m pro -f image.png "Describe this image comprehensively:

Main subject/content

Colors and composition

Text visible (if any)

Context and purpose

Notable details"


### Extract Text (OCR)

gemini -m pro -f screenshot.png "Extract all text from this image.

Format as plain text, preserving layout where possible.

Include any text in buttons, labels, or UI elements."


### Code from Screenshot

gemini -m pro -f code-screenshot.png "Extract the code from this screenshot.

Provide as properly formatted code with correct indentation.

Note any parts that are unclear or partially visible."


### UI Analysis

gemini -m pro -f ui-screenshot.png "Analyze this UI:

What application/website is this?

What page/screen is shown?

Main UI elements and their purpose

User flow/actions available

Any UX issues or suggestions"


### Error Analysis

gemini -m pro -f error-screenshot.png "Analyze this error:

What error is shown?

What is the likely cause?

How to fix it?

Any related information visible?"


### Diagram Understanding

gemini -m pro -f diagram.png "Explain this diagram:

What type of diagram is this?

Main components and their relationships

Data/process flow

Key takeaways"


## Specific Use Cases

### Debug Screenshot

gemini -m pro -f debug-screen.png "I'm debugging an issue. From this screenshot:

What is the current state?

What errors or warnings are visible?

What should I look at?

Suggested next steps"


### Compare Before/After

gemini -m pro -f before.png -f after.png "Compare these before and after images:

What changed?

Is this an improvement?

Any issues in the 'after' version?

Anything missing?"


### Design Feedback

gemini -m pro -f design.png "Provide design feedback:

Visual hierarchy

Color usage

Typography

Spacing and alignment

Accessibility concerns

Suggestions for improvement"


### Data Extraction

gemini -m pro -f chart.png "Extract data from this chart:

Chart type

Data series and values

Axes labels and ranges

Key trends or insights

Output as structured data if possible"


### Form Analysis

gemini -m pro -f form.png "Analyze this form:

Form purpose

Fields and their types

Required vs optional

Validation rules visible

UX suggestions"


## Workflow Patterns

### Screenshot to Issue

Capture screenshot (macOS)

screencapture -i /tmp/bug.png

Analyze and format as issue

gemini -m pro -f /tmp/bug.png "Create a bug report from this screenshot:

Summary

[One-line description]

Steps to Reproduce

[Inferred from screenshot]

Expected Behavior

[What should happen]

Actual Behavior

[What the screenshot shows]

Environment

[Any visible system info]"


### UI to Code

gemini -m pro -f ui-design.png "Generate React component code that recreates this UI:

Use Tailwind CSS for styling

Make it responsive

Include proper TypeScript types

Add appropriate accessibility attributes"


### Documentation

gemini -m pro -f app-screen.png "Write user documentation for this screen:

What this screen is for

How to use each feature

Common tasks

Tips and notes"