markitdown

Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP,…

INSTALLATION
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill markitdown
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

MarkItDown - File to Markdown Conversion

Overview

MarkItDown is a Python tool developed by Microsoft for converting various file formats to Markdown. It's particularly useful for converting documents into LLM-friendly text format, as Markdown is token-efficient and well-understood by modern language models.

Key Benefits:

  • Convert documents to clean, structured Markdown
  • Token-efficient format for LLM processing
  • Supports 15+ file formats
  • Optional AI-enhanced image descriptions
  • OCR for images and scanned documents
  • Speech transcription for audio files

Visual Enhancement with Scientific Schematics

When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.

If your document does not already contain schematics or diagrams:

  • Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
  • Simply describe your desired diagram in natural language
  • Nano Banana Pro will automatically generate, review, and refine the schematic

For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.

How to generate schematics:

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

  • Create publication-quality images with proper formatting
  • Review and refine through multiple iterations
  • Ensure accessibility (colorblind-friendly, high contrast)
  • Save outputs in the figures/ directory

When to add schematics:

  • Document conversion workflow diagrams
  • File format architecture illustrations
  • OCR processing pipeline diagrams
  • Integration workflow visualizations
  • System architecture diagrams
  • Data flow diagrams
  • Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.

Supported Formats

Format

Description

Notes

PDF

Portable Document Format

Full text extraction

DOCX

Microsoft Word

Tables, formatting preserved

PPTX

PowerPoint

Slides with notes

XLSX

Excel spreadsheets

Tables and data

Images

JPEG, PNG, GIF, WebP

EXIF metadata + OCR

Audio

WAV, MP3

Metadata + transcription

HTML

Web pages

Clean conversion

CSV

Comma-separated values

Table format

JSON

JSON data

Structured representation

XML

XML documents

Structured format

ZIP

Archive files

Iterates contents

EPUB

E-books

Full text extraction

YouTube

Video URLs

Fetch transcriptions

Quick Start

Installation

# Install with all features

pip install 'markitdown[all]'

# Or from source

git clone https://github.com/microsoft/markitdown.git

cd markitdown

pip install -e 'packages/markitdown[all]'

Command-Line Usage

# Basic conversion

markitdown document.pdf > output.md

# Specify output file

markitdown document.pdf -o output.md

# Pipe content

cat document.pdf | markitdown > output.md

# Enable plugins

markitdown --list-plugins  # List available plugins

markitdown --use-plugins document.pdf -o output.md

Python API

from markitdown import MarkItDown

# Basic usage

md = MarkItDown()

result = md.convert("document.pdf")

print(result.text_content)

# Convert from stream

with open("document.pdf", "rb") as f:

    result = md.convert_stream(f, file_extension=".pdf")

    print(result.text_content)

Advanced Features

1. AI-Enhanced Image Descriptions

Use LLMs via OpenRouter to generate detailed image descriptions (for PPTX and image files):

from markitdown import MarkItDown

from openai import OpenAI

# Initialize OpenRouter client (OpenAI-compatible API)

client = OpenAI(

    api_key="your-openrouter-api-key",

    base_url="https://openrouter.ai/api/v1"

)

md = MarkItDown(

    llm_client=client,

    llm_model="anthropic/claude-opus-4.5",  # recommended for scientific vision

    llm_prompt="Describe this image in detail for scientific documentation"

)

result = md.convert("presentation.pptx")

print(result.text_content)

2. Azure Document Intelligence

For enhanced PDF conversion with Microsoft Document Intelligence:

# Command line

markitdown document.pdf -o output.md -d -e "<document_intelligence_endpoint>"
# Python API

from markitdown import MarkItDown

md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")

result = md.convert("complex_document.pdf")

print(result.text_content)

3. Plugin System

MarkItDown supports 3rd-party plugins for extending functionality:

# List installed plugins

markitdown --list-plugins

# Enable plugins

markitdown --use-plugins file.pdf -o output.md

Find plugins on GitHub with hashtag: #markitdown-plugin

Optional Dependencies

Control which file formats you support:

# Install specific formats

pip install 'markitdown[pdf, docx, pptx]'

# All available options:

# [all]                  - All optional dependencies

# [pptx]                 - PowerPoint files

# [docx]                 - Word documents

# [xlsx]                 - Excel spreadsheets

# [xls]                  - Older Excel files

# [pdf]                  - PDF documents

# [outlook]              - Outlook messages

# [az-doc-intel]         - Azure Document Intelligence

# [audio-transcription]  - WAV and MP3 transcription

# [youtube-transcription] - YouTube video transcription

Common Use Cases

1. Convert Scientific Papers to Markdown

from markitdown import MarkItDown

md = MarkItDown()

# Convert PDF paper

result = md.convert("research_paper.pdf")

with open("paper.md", "w") as f:

    f.write(result.text_content)

2. Extract Data from Excel for Analysis

from markitdown import MarkItDown

md = MarkItDown()

result = md.convert("data.xlsx")

# Result will be in Markdown table format

print(result.text_content)

3. Process Multiple Documents

from markitdown import MarkItDown

import os

from pathlib import Path

md = MarkItDown()

# Process all PDFs in a directory

pdf_dir = Path("papers/")

output_dir = Path("markdown_output/")

output_dir.mkdir(exist_ok=True)

for pdf_file in pdf_dir.glob("*.pdf"):

    result = md.convert(str(pdf_file))

    output_file = output_dir / f"{pdf_file.stem}.md"

    output_file.write_text(result.text_content)

    print(f"Converted: {pdf_file.name}")

4. Convert PowerPoint with AI Descriptions

from markitdown import MarkItDown

from openai import OpenAI

# Use OpenRouter for access to multiple AI models

client = OpenAI(

    api_key="your-openrouter-api-key",

    base_url="https://openrouter.ai/api/v1"

)

md = MarkItDown(

    llm_client=client,

    llm_model="anthropic/claude-opus-4.5",  # recommended for presentations

    llm_prompt="Describe this slide image in detail, focusing on key visual elements and data"

)

result = md.convert("presentation.pptx")

with open("presentation.md", "w") as f:

    f.write(result.text_content)

5. Batch Convert with Different Formats

from markitdown import MarkItDown

from pathlib import Path

md = MarkItDown()

# Files to convert

files = [

    "document.pdf",

    "spreadsheet.xlsx",

    "presentation.pptx",

    "notes.docx"

]

for file in files:

    try:

        result = md.convert(file)

        output = Path(file).stem + ".md"

        with open(output, "w") as f:

            f.write(result.text_content)

        print(f"✓ Converted {file}")

    except Exception as e:

        print(f"✗ Error converting {file}: {e}")

6. Extract YouTube Video Transcription

from markitdown import MarkItDown

md = MarkItDown()

# Convert YouTube video to transcript

result = md.convert("https://www.youtube.com/watch?v=VIDEO_ID")

print(result.text_content)

Docker Usage

# Build image

docker build -t markitdown:latest .

# Run conversion

docker run --rm -i markitdown:latest < ~/document.pdf > output.md

Best Practices

1. Choose the Right Conversion Method

  • Simple documents: Use basic MarkItDown()
  • Complex PDFs: Use Azure Document Intelligence
  • Visual content: Enable AI image descriptions
  • Scanned documents: Ensure OCR dependencies are installed

2. Handle Errors Gracefully

from markitdown import MarkItDown

md = MarkItDown()

try:

    result = md.convert("document.pdf")

    print(result.text_content)

except FileNotFoundError:

    print("File not found")

except Exception as e:

    print(f"Conversion error: {e}")

3. Process Large Files Efficiently

from markitdown import MarkItDown

md = MarkItDown()

# For large files, use streaming

with open("large_file.pdf", "rb") as f:

    result = md.convert_stream(f, file_extension=".pdf")

    # Process in chunks or save directly

    with open("output.md", "w") as out:

        out.write(result.text_content)

4. Optimize for Token Efficiency

Markdown output is already token-efficient, but you can:

  • Remove excessive whitespace
  • Consolidate similar sections
  • Strip metadata if not needed
from markitdown import MarkItDown

import re

md = MarkItDown()

result = md.convert("document.pdf")

# Clean up extra whitespace

clean_text = re.sub(r'\n{3,}', '\n\n', result.text_content)

clean_text = clean_text.strip()

print(clean_text)

Integration with Scientific Workflows

Convert Literature for Review

from markitdown import MarkItDown

from pathlib import Path

md = MarkItDown()

# Convert all papers in literature folder

papers_dir = Path("literature/pdfs")

output_dir = Path("literature/markdown")

output_dir.mkdir(exist_ok=True)

for paper in papers_dir.glob("*.pdf"):

    result = md.convert(str(paper))

    # Save with metadata

    output_file = output_dir / f"{paper.stem}.md"

    content = f"# {paper.stem}\n\n"

    content += f"**Source**: {paper.name}\n\n"

    content += "---\n\n"

    content += result.text_content

    output_file.write_text(content)

# For AI-enhanced conversion with figures

from openai import OpenAI

client = OpenAI(

    api_key="your-openrouter-api-key",

    base_url="https://openrouter.ai/api/v1"

)

md_ai = MarkItDown(

    llm_client=client,

    llm_model="anthropic/claude-opus-4.5",

    llm_prompt="Describe scientific figures with technical precision"

)

Extract Tables for Analysis

from markitdown import MarkItDown

import re

md = MarkItDown()

result = md.convert("data_tables.xlsx")

# Markdown tables can be parsed or used directly

print(result.text_content)

Troubleshooting

Common Issues

-

Missing dependencies: Install feature-specific packages

pip install 'markitdown[pdf]'  # For PDF support

-

Binary file errors: Ensure files are opened in binary mode

with open("file.pdf", "rb") as f:  # Note the "rb"

    result = md.convert_stream(f, file_extension=".pdf")

-

OCR not working: Install tesseract

# macOS

brew install tesseract

# Ubuntu

sudo apt-get install tesseract-ocr

Performance Considerations

  • PDF files: Large PDFs may take time; consider page ranges if supported
  • Image OCR: OCR processing is CPU-intensive
  • Audio transcription: Requires additional compute resources
  • AI image descriptions: Requires API calls (costs may apply)

Next Steps

  • See references/api_reference.md for complete API documentation
  • Check references/file_formats.md for format-specific details
  • Review scripts/batch_convert.py for automation examples
  • Explore scripts/convert_with_ai.py for AI-enhanced conversions

Resources

  • MCP Server: markitdown-mcp (for Claude Desktop integration)
  • Plugin Development: See packages/markitdown-sample-plugin
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card