transformers-js

Use Transformers.js to run state-of-the-art machine learning models directly in JavaScript/TypeScript. Supports NLP (text classification, translation,…

INSTALLATION
npx skills add https://github.com/huggingface/skills --skill transformers-js
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Transformers.js - Machine Learning for JavaScript

Transformers.js enables running state-of-the-art machine learning models directly in JavaScript across browsers and server-side runtimes (Node.js, Bun, Deno), with no Python server required.

When to Use This Skill

Use this skill when you need to:

  • Run ML models for text analysis, generation, or translation in JavaScript
  • Perform image classification, object detection, or segmentation
  • Implement speech recognition or audio processing
  • Build multimodal AI applications (text-to-image, image-to-text, etc.)
  • Run models client-side in the browser without a backend

Installation

NPM Installation

npm install @huggingface/transformers

Browser Usage (CDN)

<script type="module">

  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';

</script>

Core Concepts

1. Pipeline API

The pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:

import { pipeline } from '@huggingface/transformers';

// Create a pipeline for a specific task

const pipe = await pipeline('sentiment-analysis');

// Use the pipeline

const result = await pipe('I love transformers!');

// Output: [{ label: 'POSITIVE', score: 0.999817686 }]

// IMPORTANT: Always dispose when done to free memory

await pipe.dispose();

⚠️ Memory Management: All pipelines must be disposed with pipe.dispose() when finished to prevent memory leaks. See examples in Code Examples for cleanup patterns across different environments.

2. Model Selection

You can specify a custom model as the second argument:

const pipe = await pipeline(

  'sentiment-analysis',

  'Xenova/bert-base-multilingual-uncased-sentiment'

);

Finding Models:

Browse available Transformers.js models on Hugging Face Hub:

  • By task: Add pipeline_tag parameter

Tip: Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.

3. Device Selection

Choose where to run the model:

// Run on CPU (default for WASM)

const pipe = await pipeline('sentiment-analysis', 'model-id');

// Run on GPU (WebGPU)

const pipe = await pipeline('sentiment-analysis', 'model-id', {

  device: 'webgpu',

});

4. Quantization Options

Control model precision vs. performance:

// Use quantized model (faster, smaller)

const pipe = await pipeline('sentiment-analysis', 'model-id', {

  dtype: 'q4',  // Options: 'fp32', 'fp16', 'q8', 'q4'

});

Supported Tasks

Note: All examples below show basic usage.

Natural Language Processing

#### Text Classification

const classifier = await pipeline('text-classification');

const result = await classifier('This movie was amazing!');

#### Named Entity Recognition (NER)

const ner = await pipeline('token-classification');

const entities = await ner('My name is John and I live in New York.');

#### Question Answering

const qa = await pipeline('question-answering');

const answer = await qa({

  question: 'What is the capital of France?',

  context: 'Paris is the capital and largest city of France.'

});

#### Text Generation

const generator = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX');

const text = await generator('Once upon a time', {

  max_new_tokens: 100,

  temperature: 0.7

});

For streaming and chat: See Text Generation Guide for:

  • Streaming token-by-token output with TextStreamer
  • Chat/conversation format with system/user/assistant roles
  • Generation parameters (temperature, top_k, top_p)
  • Browser and Node.js examples
  • React components and API endpoints

#### Translation

const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');

const output = await translator('Hello, how are you?', {

  src_lang: 'eng_Latn',

  tgt_lang: 'fra_Latn'

});

#### Summarization

const summarizer = await pipeline('summarization');

const summary = await summarizer(longText, {

  max_length: 100,

  min_length: 30

});

#### Zero-Shot Classification

const classifier = await pipeline('zero-shot-classification');

const result = await classifier('This is a story about sports.', ['politics', 'sports', 'technology']);

Computer Vision

#### Image Classification

const classifier = await pipeline('image-classification');

const result = await classifier('https://example.com/image.jpg');

// Or with local file

const result = await classifier(imageUrl);

#### Object Detection

const detector = await pipeline('object-detection');

const objects = await detector('https://example.com/image.jpg');

// Returns: [{ label: 'person', score: 0.95, box: { xmin, ymin, xmax, ymax } }, ...]

#### Image Segmentation

const segmenter = await pipeline('image-segmentation');

const segments = await segmenter('https://example.com/image.jpg');

#### Depth Estimation

const depthEstimator = await pipeline('depth-estimation');

const depth = await depthEstimator('https://example.com/image.jpg');

#### Zero-Shot Image Classification

const classifier = await pipeline('zero-shot-image-classification');

const result = await classifier('image.jpg', ['cat', 'dog', 'bird']);

Audio Processing

#### Automatic Speech Recognition

const transcriber = await pipeline('automatic-speech-recognition');

const result = await transcriber('audio.wav');

// Returns: { text: 'transcribed text here' }

#### Audio Classification

const classifier = await pipeline('audio-classification');

const result = await classifier('audio.wav');

#### Text-to-Speech

const synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts');

const audio = await synthesizer('Hello, this is a test.', {

  speaker_embeddings: speakerEmbeddings

});

Multimodal

#### Image-to-Text (Image Captioning)

const captioner = await pipeline('image-to-text');

const caption = await captioner('image.jpg');

#### Document Question Answering

const docQA = await pipeline('document-question-answering');

const answer = await docQA('document-image.jpg', 'What is the total amount?');

#### Zero-Shot Object Detection

const detector = await pipeline('zero-shot-object-detection');

const objects = await detector('image.jpg', ['person', 'car', 'tree']);

Feature Extraction (Embeddings)

const extractor = await pipeline('feature-extraction');

const embeddings = await extractor('This is a sentence to embed.');

// Returns: tensor of shape [1, sequence_length, hidden_size]

// For sentence embeddings (mean pooling)

const extractor = await pipeline('feature-extraction', 'onnx-community/all-MiniLM-L6-v2-ONNX');

const embeddings = await extractor('Text to embed', { pooling: 'mean', normalize: true });

Finding and Choosing Models

Browsing the Hugging Face Hub

Discover compatible Transformers.js models on Hugging Face Hub:

Base URL (all models):

https://huggingface.co/models?library=transformers.js&#x26;sort=trending

Filter by task using the pipeline_tag parameter:

Task

URL

Text Generation

https://huggingface.co/models?pipeline_tag=text-generation&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Text Classification

https://huggingface.co/models?pipeline_tag=text-classification&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Translation

https://huggingface.co/models?pipeline_tag=translation&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Summarization

https://huggingface.co/models?pipeline_tag=summarization&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Question Answering

https://huggingface.co/models?pipeline_tag=question-answering&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Image Classification

https://huggingface.co/models?pipeline_tag=image-classification&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Object Detection

https://huggingface.co/models?pipeline_tag=object-detection&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Image Segmentation

https://huggingface.co/models?pipeline_tag=image-segmentation&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Speech Recognition

https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Audio Classification

https://huggingface.co/models?pipeline_tag=audio-classification&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Image-to-Text

https://huggingface.co/models?pipeline_tag=image-to-text&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Feature Extraction

https://huggingface.co/models?pipeline_tag=feature-extraction&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Zero-Shot Classification

https://huggingface.co/models?pipeline_tag=zero-shot-classification&amp;#x26;library=transformers.js&amp;#x26;sort=trending

Sort options:

  • &#x26;sort=trending - Most popular recently
  • &#x26;sort=downloads - Most downloaded overall
  • &#x26;sort=likes - Most liked by community
  • &#x26;sort=modified - Recently updated

Choosing the Right Model

Consider these factors when selecting a model:

1. Model Size

  • Small (< 100MB): Fast, suitable for browsers, limited accuracy
  • Medium (100MB - 500MB): Balanced performance, good for most use cases
  • Large (> 500MB): High accuracy, slower, better for Node.js or powerful devices

2. Quantization

Models are often available in different quantization levels:

  • fp32 - Full precision (largest, most accurate)
  • fp16 - Half precision (smaller, still accurate)
  • q8 - 8-bit quantized (much smaller, slight accuracy loss)
  • q4 - 4-bit quantized (smallest, noticeable accuracy loss)

3. Task Compatibility

Check the model card for:

  • Supported tasks (some models support multiple tasks)
  • Input/output formats
  • Language support (multilingual vs. English-only)
  • License restrictions

4. Performance Metrics

Model cards typically show:

  • Accuracy scores
  • Benchmark results
  • Inference speed
  • Memory requirements

Example: Finding a Text Generation Model

// 1. Visit: https://huggingface.co/models?pipeline_tag=text-generation&#x26;library=transformers.js&#x26;sort=trending

// 2. Browse and select a model (e.g., onnx-community/gemma-3-270m-it-ONNX)

// 3. Check model card for:

//    - Model size: ~270M parameters

//    - Quantization: q4 available

//    - Language: English

//    - Use case: Instruction-following chat

// 4. Use the model:

import { pipeline } from '@huggingface/transformers';

const generator = await pipeline(

  'text-generation',

  'onnx-community/gemma-3-270m-it-ONNX',

  { dtype: 'q4' } // Use quantized version for faster inference

);

const output = await generator('Explain quantum computing in simple terms.', {

  max_new_tokens: 100

});

await generator.dispose();

Tips for Model Selection

  • Start Small: Test with a smaller model first, then upgrade if needed
  • Check ONNX Support: Ensure the model has ONNX files (look for onnx folder in model repo)
  • Read Model Cards: Model cards contain usage examples, limitations, and benchmarks
  • Test Locally: Benchmark inference speed and memory usage in your environment
  • Version Pin: Use specific git commits in production for stability:
const pipe = await pipeline('task', 'model-id', { revision: 'abc123' });

Advanced Configuration

Environment Configuration ( env )

The env object provides comprehensive control over Transformers.js execution, caching, and model loading.

Quick Overview:

import { env, LogLevel } from '@huggingface/transformers';

// View version

console.log(env.version); // e.g., '4.x'

// Common settings

env.allowRemoteModels = true;  // Load from Hugging Face Hub

env.allowLocalModels = false;  // Load from file system

env.localModelPath = '/models/'; // Local model directory

env.useFSCache = true;         // Cache models on disk (Node.js)

env.useBrowserCache = true;    // Cache models in browser

env.cacheDir = './.cache';     // Cache directory location

// Optional: override logging level (default is LogLevel.WARNING)

env.logLevel = LogLevel.INFO;

// Optional: custom fetch for auth headers, retries, abort signals, etc.

env.fetch = (url, options) =>

  fetch(url, {

    ...options,

    headers: {

      ...options?.headers,

      Authorization: `Bearer ${HF_TOKEN}`,

    },

  });

Configuration Patterns:

// Development: Fast iteration with remote models

env.allowRemoteModels = true;

env.useFSCache = true;

// Production: Local models only

env.allowRemoteModels = false;

env.allowLocalModels = true;

env.localModelPath = '/app/models/';

// Custom CDN

env.remoteHost = 'https://cdn.example.com/models';

// Disable caching (testing)

env.useFSCache = false;

env.useBrowserCache = false;

For complete documentation on all configuration options, caching strategies, cache management, pre-downloading models, and more, see:

Configuration Reference

ModelRegistry (v4)

ModelRegistry gives you visibility and control over model assets before loading a pipeline. Use it to estimate download size, check cache status, inspect available dtypes, and clear cached artifacts for a specific task/model/options tuple.

import { ModelRegistry } from '@huggingface/transformers';

const task = 'feature-extraction';

const modelId = 'onnx-community/all-MiniLM-L6-v2-ONNX';

const modelOptions = { dtype: 'fp32' };

// List required files for this pipeline

const files = await ModelRegistry.get_pipeline_files(task, modelId, modelOptions);

// Check if assets are already cached

const cached = await ModelRegistry.is_pipeline_cached(task, modelId, modelOptions);

// Inspect precision formats available for this model

const dtypes = await ModelRegistry.get_available_dtypes(modelId);

console.log({ files: files.length, cached, dtypes });

For production patterns and full API coverage, see ModelRegistry Reference.

Standalone Tokenization ( @huggingface/tokenizers )

For tokenization-only workflows, use @huggingface/tokenizers. It is a separate lightweight package useful when you need fast tokenization/encoding without loading full model inference pipelines.

npm install @huggingface/tokenizers
import { Tokenizer } from '@huggingface/tokenizers';

Working with Tensors

import { AutoTokenizer, AutoModel } from '@huggingface/transformers';

// Load tokenizer and model separately for more control

const tokenizer = await AutoTokenizer.from_pretrained('bert-base-uncased');

const model = await AutoModel.from_pretrained('bert-base-uncased');

// Tokenize input

const inputs = await tokenizer('Hello world!');

// Run model

const outputs = await model(inputs);

Batch Processing

const classifier = await pipeline('sentiment-analysis');

// Process multiple texts

const results = await classifier([

  'I love this!',

  'This is terrible.',

  'It was okay.'

]);

Runtime-Specific Considerations

WebGPU Usage

WebGPU provides GPU acceleration in browsers and server-side runtimes (when supported):

const pipe = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX', {

  device: 'webgpu',

  dtype: 'fp32'

});

Note: Use webgpu when available and fall back to WASM/CPU when not supported in the current runtime.

WASM Performance

WASM is the most compatible execution backend across runtimes:

// Optimized for browsers with quantization

const pipe = await pipeline('sentiment-analysis', 'model-id', {

  dtype: 'q8'  // or 'q4' for even smaller size

});

Progress Tracking &#x26; Loading Indicators

Models can be large (ranging from a few MB to several GB) and consist of multiple files. Track download progress by passing a callback to the pipeline() function:

import { pipeline } from '@huggingface/transformers';

// Track progress for each file

const fileProgress = {};

function onProgress(info) {

  if (info.status === 'progress_total') {

    console.log(`Total: ${info.progress.toFixed(1)}%`);

    return;

  }

  console.log(`${info.status}: ${info.file ?? ''}`);

  if (info.status === 'progress') {

    fileProgress[info.file] = info.progress;

    console.log(`${info.file}: ${info.progress.toFixed(1)}%`);

  }

  if (info.status === 'done') {

    console.log(`✓ ${info.file} complete`);

  }

}

// Pass callback to pipeline

const classifier = await pipeline('sentiment-analysis', null, {

  progress_callback: onProgress

});

Progress Info Properties:

interface ProgressInfo {

  status: 'initiate' | 'download' | 'progress' | 'progress_total' | 'done' | 'ready';

  name: string;      // Model id or path

  file?: string;     // File being processed (per-file events)

  progress?: number; // Percentage (0-100, for 'progress' and 'progress_total')

  loaded?: number;   // Bytes downloaded (only for 'progress' status)

  total?: number;    // Total bytes (only for 'progress' status)

}

For complete examples including browser UIs, React components, CLI progress bars, and retry logic, see:

Pipeline Options - Progress Callback

Error Handling

try {

  const pipe = await pipeline('sentiment-analysis', 'model-id');

  const result = await pipe('text to analyze');

} catch (error) {

  if (error.message.includes('fetch')) {

    console.error('Model download failed. Check internet connection.');

  } else if (error.message.includes('ONNX')) {

    console.error('Model execution failed. Check model compatibility.');

  } else {

    console.error('Unknown error:', error);

  }

}

Performance Tips

  • Reuse Pipelines: Create pipeline once, reuse for multiple inferences
  • Use Quantization: Start with q8 or q4 for faster inference
  • Batch Processing: Process multiple inputs together when possible
  • Cache Models: Models are cached automatically (see Caching Reference for details on browser Cache API, Node.js filesystem cache, and custom implementations)
  • WebGPU for Large Models: Use WebGPU for models that benefit from GPU acceleration
  • Prune Context: For text generation, limit max_new_tokens to avoid memory issues
  • Clean Up Resources: Call pipe.dispose() when done to free memory

Memory Management

IMPORTANT: Always call pipe.dispose() when finished to prevent memory leaks.

const pipe = await pipeline('sentiment-analysis');

const result = await pipe('Great product!');

await pipe.dispose();  // ✓ Free memory (100MB - several GB per model)

When to dispose:

  • Application shutdown or component unmount
  • Before loading a different model
  • After batch processing in long-running apps

Models consume significant memory and hold GPU/CPU resources. Disposal is critical for browser memory limits and server stability.

For detailed patterns (React cleanup, servers, browser), see Code Examples

Troubleshooting

Model Not Found

  • Verify model exists on Hugging Face Hub
  • Check model name spelling
  • Ensure model has ONNX files (look for onnx folder in model repo)

Memory Issues

  • Use smaller models or quantized versions (dtype: 'q4')
  • Reduce batch size
  • Limit sequence length with max_length

WebGPU Errors

  • Check browser compatibility (Chrome 113+, Edge 113+)
  • Try dtype: 'fp16' if fp32 fails
  • Fall back to WASM if WebGPU unavailable

Reference Documentation

This Skill

  • Pipeline Options - Configure pipeline() with progress_callback, device, dtype, etc.
  • Caching Reference - Browser Cache API, Node.js filesystem cache, and custom cache implementations
  • Code Examples - Real-world implementations for different runtimes

Official Transformers.js

Best Practices

  • Always Dispose Pipelines: Call pipe.dispose() when done - critical for preventing memory leaks
  • Start with Pipelines: Use the pipeline API unless you need fine-grained control
  • Test Locally First: Test models with small inputs before deploying
  • Monitor Model Sizes: Be aware of model download sizes for web applications
  • Handle Loading States: Show progress indicators for better UX
  • Version Pin: Pin specific model versions for production stability
  • Error Boundaries: Always wrap pipeline calls in try-catch blocks
  • Progressive Enhancement: Provide fallbacks for unsupported browsers
  • Reuse Models: Load once, use many times - don't recreate pipelines unnecessarily
  • Graceful Shutdown: Dispose models on SIGTERM/SIGINT in servers

Quick Reference: Task IDs

Task

Task ID

Text classification

text-classification or sentiment-analysis

Token classification

token-classification or ner

Question answering

question-answering

Fill mask

fill-mask

Summarization

summarization

Translation

translation

Text generation

text-generation

Text-to-text generation

text2text-generation

Zero-shot classification

zero-shot-classification

Image classification

image-classification

Image segmentation

image-segmentation

Object detection

object-detection

Depth estimation

depth-estimation

Image-to-image

image-to-image

Zero-shot image classification

zero-shot-image-classification

Zero-shot object detection

zero-shot-object-detection

Automatic speech recognition

automatic-speech-recognition

Audio classification

audio-classification

Text-to-speech

text-to-speech or text-to-audio

Image-to-text

image-to-text

Document question answering

document-question-answering

Feature extraction

feature-extraction

Sentence similarity

sentence-similarity

This skill enables you to integrate state-of-the-art machine learning capabilities directly into JavaScript applications without requiring separate ML servers or Python environments.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card