SKILL.md

Type4Me macOS Voice Input

Skill by ara.so — Daily 2026 Skills collection.

Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync.

Architecture Overview

Type4Me/

├── ASR/                    # ASR engine abstraction

│   ├── ASRProvider.swift          # Provider enum + protocols

│   ├── ASRProviderRegistry.swift  # Plugin registry

│   ├── Providers/                 # Per-vendor config files

│   ├── SherpaASRClient.swift      # Local streaming ASR

│   ├── SherpaOfflineASRClient.swift

│   ├── VolcASRClient.swift        # Volcengine streaming ASR

│   └── DeepgramASRClient.swift    # Deepgram streaming ASR

├── Bridge/                 # SherpaOnnx C API Swift bridge

├── Audio/                  # Audio capture

├── Session/                # Core state machine: record→ASR→inject

├── Input/                  # Global hotkey management

├── Services/               # Credentials, hotwords, model manager

├── Protocol/               # Volcengine WebSocket codec

└── UI/                     # SwiftUI (FloatingBar + Settings)

Installation

Prerequisites

# Xcode Command Line Tools

xcode-select --install

# CMake (for local ASR engine)

brew install cmake

Build & Deploy from Source

git clone https://github.com/joewongjc/type4me.git

cd type4me

# Step 1: Compile SherpaOnnx local engine (~5 min, one-time)

bash scripts/build-sherpa.sh

# Step 2: Build, bundle, sign, install to /Applications, and launch

bash scripts/deploy.sh

Download Pre-built App

Download Type4Me-v1.2.3.dmg from releases (cloud ASR only, no local engine):

https://github.com/joewongjc/type4me/releases/tag/v1.2.3

If macOS blocks the app:

xattr -d com.apple.quarantine /Applications/Type4Me.app

Download Local ASR Models

mkdir -p ~/Library/Application\ Support/Type4Me/Models

# Option A: Lightweight ~20MB

tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \

    -C ~/Library/Application\ Support/Type4Me/Models/

# Option B: Balanced ~236MB (recommended)

tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \

    -C ~/Library/Application\ Support/Type4Me/Models/

# Option C: Bilingual Chinese+English ~1GB

tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \

    -C ~/Library/Application\ Support/Type4Me/Models/

Expected structure for Paraformer model:

~/Library/Application Support/Type4Me/Models/

└── sherpa-onnx-streaming-paraformer-bilingual-zh-en/

    ├── encoder.int8.onnx

    ├── decoder.int8.onnx

    └── tokens.txt

Key Protocols

SpeechRecognizer Protocol

Every ASR client must implement this protocol:

protocol SpeechRecognizer: AnyObject {

    /// Start a new recognition session

    func startRecognition() async throws

    /// Feed raw PCM audio data

    func appendAudio(_ buffer: AVAudioPCMBuffer) async

    /// Stop and get final result

    func stopRecognition() async throws -> String

    /// Cancel without result

    func cancelRecognition() async

    /// Streaming partial results (optional)

    var partialResultHandler: ((String) -> Void)? { get set }

}

ASRProviderConfig Protocol

Each vendor's credential definition:

protocol ASRProviderConfig {

    /// Unique identifier string

    static var providerID: String { get }

    /// Display name in Settings UI

    static var displayName: String { get }

    /// Credential fields shown in Settings

    static var credentialFields: [CredentialField] { get }

    /// Validate credentials before use

    static func validate(_ credentials: [String: String]) -> Bool

    /// Create the recognizer instance

    static func createClient(

        credentials: [String: String],

        config: RecognitionConfig

    ) throws -> SpeechRecognizer

}

Adding a New ASR Provider

Step 1: Create Provider Config

Create Type4Me/ASR/Providers/OpenAIWhisperProvider.swift:

import Foundation

struct OpenAIWhisperProvider: ASRProviderConfig {

    static let providerID = "openai_whisper"

    static let displayName = "OpenAI Whisper"

    static let credentialFields: [CredentialField] = [

        CredentialField(

            key: "api_key",

            label: "API Key",

            placeholder: "sk-...",

            isSecret: true

        ),

        CredentialField(

            key: "model",

            label: "Model",

            placeholder: "whisper-1",

            isSecret: false

        )

    ]

    static func validate(_ credentials: [String: String]) -> Bool {

        guard let apiKey = credentials["api_key"], !apiKey.isEmpty else {

            return false

        }

        return apiKey.hasPrefix("sk-")

    }

    static func createClient(

        credentials: [String: String],

        config: RecognitionConfig

    ) throws -> SpeechRecognizer {

        guard let apiKey = credentials["api_key"] else {

            throw ASRError.missingCredential("api_key")

        }

        let model = credentials["model"] ?? "whisper-1"

        return OpenAIWhisperASRClient(apiKey: apiKey, model: model, config: config)

    }

}

Step 2: Implement the ASR Client

Create Type4Me/ASR/OpenAIWhisperASRClient.swift:

import Foundation

import AVFoundation

final class OpenAIWhisperASRClient: SpeechRecognizer {

    var partialResultHandler: ((String) -> Void)?

    private let apiKey: String

    private let model: String

    private let config: RecognitionConfig

    private var audioData: Data = Data()

    init(apiKey: String, model: String, config: RecognitionConfig) {

        self.apiKey = apiKey

        self.model = model

        self.config = config

    }

    func startRecognition() async throws {

        audioData = Data()

    }

    func appendAudio(_ buffer: AVAudioPCMBuffer) async {

        // Convert PCM buffer to raw bytes and accumulate

        guard let channelData = buffer.floatChannelData?[0] else { return }

        let frameCount = Int(buffer.frameLength)

        let bytes = UnsafeBufferPointer(start: channelData, count: frameCount)

        // Convert Float32 PCM to Int16 for Whisper API

        let int16Samples = bytes.map { sample -> Int16 in

            return Int16(max(-32768, min(32767, Int(sample * 32767))))

        }

        int16Samples.withUnsafeBytes { ptr in

            audioData.append(contentsOf: ptr)

        }

    }

    func stopRecognition() async throws -> String {

        // Build multipart form request to Whisper API

        var request = URLRequest(url: URL(string: "https://api.openai.com/v1/audio/transcriptions")!)

        request.httpMethod = "POST"

        request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")

        let boundary = UUID().uuidString

        request.setValue("multipart/form-data; boundary=\(boundary)",

                        forHTTPHeaderField: "Content-Type")

        var body = Data()

        // Append audio file part

        body.append("--\(boundary)\r\n".data(using: .utf8)!)

        body.append("Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n".data(using: .utf8)!)

        body.append("Content-Type: audio/raw\r\n\r\n".data(using: .utf8)!)

        body.append(audioData)

        body.append("\r\n".data(using: .utf8)!)

        // Append model part

        body.append("--\(boundary)\r\n".data(using: .utf8)!)

        body.append("Content-Disposition: form-data; name=\"model\"\r\n\r\n".data(using: .utf8)!)

        body.append("\(model)\r\n".data(using: .utf8)!)

        body.append("--\(boundary)--\r\n".data(using: .utf8)!)

        request.httpBody = body

        let (data, response) = try await URLSession.shared.data(for: request)

        guard let httpResponse = response as? HTTPURLResponse,

              httpResponse.statusCode == 200 else {

            throw ASRError.networkError("Whisper API returned error")

        }

        let result = try JSONDecoder().decode(WhisperResponse.self, from: data)

        return result.text

    }

    func cancelRecognition() async {

        audioData = Data()

    }

}

private struct WhisperResponse: Codable {

    let text: String

}

Step 3: Register the Provider

In Type4Me/ASR/ASRProviderRegistry.swift, add to the all array:

struct ASRProviderRegistry {

    static let all: [any ASRProviderConfig.Type] = [

        SherpaParaformerProvider.self,

        VolcengineProvider.self,

        DeepgramProvider.self,

        OpenAIWhisperProvider.self,   // ← Add your provider here

    ]

}

Credentials Storage

Credentials are stored at ~/Library/Application Support/Type4Me/credentials.json with permissions 0600. Never hardcode secrets — always load via CredentialStore:

// Reading credentials

let store = CredentialStore.shared

let apiKey = store.get(providerID: "openai_whisper", key: "api_key")

// Writing credentials

store.set(providerID: "openai_whisper", key: "api_key", value: userInputKey)

// Checking if configured

let isConfigured = store.isConfigured(providerID: "openai_whisper",

                                       fields: OpenAIWhisperProvider.credentialFields)

Custom Processing Modes with Prompt Variables

Processing modes use LLM post-processing with three context variables:

Variable

Value

{text}

Recognized speech text

{selected}

Text selected in active app at record start

{clipboard}

Clipboard content at record start

Example custom mode prompts:

// Translate selection using voice command

let translatePrompt = """

The user selected this text: {selected}

Voice command: {text}

Execute the command on the selected text. Output only the result.

"""

// Code review via voice

let codeReviewPrompt = """

Code to review:

{clipboard}

Review instruction: {text}

Provide focused feedback addressing the instruction.

"""

// Email reply drafting

let emailPrompt = """

Original email: {selected}

My reply intent (spoken): {text}

Write a professional email reply. Output only the email body.

"""

Built-in Processing Modes

enum ProcessingMode {

    case fast           // Direct ASR output, zero latency

    case performance    // Dual-channel: streaming + offline refinement

    case englishTranslation  // Chinese speech → English text

    case promptOptimize // Raw prompt → optimized prompt via LLM

    case command        // Voice command + selected/clipboard context → LLM action

    case custom(prompt: String)  // User-defined prompt template

}

Session State Machine

The core recording flow in Session/:

[Idle]

  → hotkey pressed → [Recording] → audio streams to ASR client

  → hotkey released/pressed again → [Processing]

  → ASR returns text → [LLM Post-processing] (if mode requires)

  → [Injecting] → text injected into active app

  → [Idle]

Updating After Source Changes

cd type4me

git pull

bash scripts/deploy.sh

# SherpaOnnx does NOT need recompiling unless engine version changed

Troubleshooting

App won't open (security warning)

xattr -d com.apple.quarantine /Applications/Type4Me.app

Local model not recognized in Settings

Verify the directory structure exactly matches:

ls ~/Library/Application\ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/

# Must show: encoder.int8.onnx  decoder.int8.onnx  tokens.txt

SherpaOnnx build fails

# Ensure cmake is installed

brew install cmake

# Clean and retry

rm -rf Frameworks/

bash scripts/build-sherpa.sh

New ASR provider not appearing in Settings

Confirm the provider type is added to ASRProviderRegistry.all

Ensure providerID is unique across all providers

Clean build: swift package clean && bash scripts/deploy.sh

Audio not captured / no floating bar

Grant microphone permission: System Settings → Privacy & Security → Microphone → Type4Me ✓

Grant Accessibility permission for text injection: System Settings → Privacy & Security → Accessibility → Type4Me ✓

Credentials not saving

# Check file exists and has correct permissions

ls -la ~/Library/Application\ Support/Type4Me/credentials.json

# Should show: -rw------- (0600)

# Fix permissions if needed:

chmod 0600 ~/Library/Application\ Support/Type4Me/credentials.json

Export history to CSV

Open Settings → History → select date range → Export CSV. The SQLite database is at:

~/Library/Application\ Support/Type4Me/history.db

# Direct query:

sqlite3 ~/Library/Application\ Support/Type4Me/history.db \

  "SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;"

System Requirements

macOS 14.0 (Sonoma) or later

Apple Silicon (M1/M2/M3/M4) recommended for local ASR inference

Xcode Command Line Tools + CMake for source builds

Internet connection only needed for cloud ASR providers

type4me-macos-voice-input

SKILL.md

Type4Me macOS Voice Input

Architecture Overview

Installation

Prerequisites

Build & Deploy from Source

Download Pre-built App

Download Local ASR Models

Key Protocols

SpeechRecognizer Protocol

ASRProviderConfig Protocol

Adding a New ASR Provider

Step 1: Create Provider Config

Step 2: Implement the ASR Client

Step 3: Register the Provider

Credentials Storage

Custom Processing Modes with Prompt Variables

Built-in Processing Modes

Session State Machine

Updating After Source Changes

Troubleshooting

App won't open (security warning)

Local model not recognized in Settings

SherpaOnnx build fails

New ASR provider not appearing in Settings

Audio not captured / no floating bar

Credentials not saving

Export history to CSV

System Requirements

Stop writing automation&scrapers

type4me-macos-voice-input

SKILL.md

Type4Me macOS Voice Input

Architecture Overview

Installation

Prerequisites

Build &#x26; Deploy from Source

Download Pre-built App

Download Local ASR Models

Key Protocols

SpeechRecognizer Protocol

ASRProviderConfig Protocol

Adding a New ASR Provider

Step 1: Create Provider Config

Step 2: Implement the ASR Client

Step 3: Register the Provider

Credentials Storage

Custom Processing Modes with Prompt Variables

Built-in Processing Modes

Session State Machine

Updating After Source Changes

Troubleshooting

App won't open (security warning)

Local model not recognized in Settings

SherpaOnnx build fails

New ASR provider not appearing in Settings

Audio not captured / no floating bar

Credentials not saving

Export history to CSV

System Requirements

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Build & Deploy from Source