image-to-text

Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design…

INSTALLATION
npx skills add https://github.com/pascalorg/skills --skill image-to-text
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Image to Text

Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.

When to Use

  • Reading text content from a screenshot or design mockup
  • Extracting UI copy (labels, buttons, headings) so you don't have to retype it
  • Getting text positions and bounding boxes from a design image

How It Works

  • The image is passed to Tesseract.js for optical character recognition
  • Tesseract segments the image into lines and words
  • Returns the full text plus word-level details (position, confidence)

Usage

bash <skill-path>/scripts/image-to-text.sh <image-path> [language]

Arguments:

  • image-path — Path to the image file (required)
  • language — OCR language code (optional, defaults to eng). Common: eng, fra, deu, spa, chi_sim, jpn

Examples:

# Extract text from a screenshot

bash <skill-path>/scripts/image-to-text.sh ./screenshot.png

# Extract French text

bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra

Output

{

  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",

  "confidence": 87.4,

  "words": [

    {

      "text": "Request",

      "confidence": 94.2,

      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }

    },

    {

      "text": "work",

      "confidence": 96.1,

      "bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }

    }

  ],

  "lines": [

    {

      "text": "Request work",

      "confidence": 95.1,

      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }

    }

  ]

}

Field

Type

Description

text

String

Full extracted text, newline-separated

confidence

Number

Overall confidence score (0-100)

words

Array

Each word with text, confidence, and bounding box

lines

Array

Each line with text, confidence, and bounding box

Present Results to User

After extracting text, present the content grouped by lines:

Extracted text (87.4% confidence):

  Request work

  Suggestions

  Plumbing

  HVAC

  Cleaning

  Electrical

Found 6 lines, 6 words.

Use the extracted text directly when implementing UI copy from a design.

Troubleshooting

Low confidence / garbled text — Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.

Wrong language — Pass the correct language code as the second argument. Tesseract needs the right language model to recognize characters.

First run is slow — Tesseract downloads language data (~4MB for English) on the first run. Subsequent runs are faster.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card