natural-language

Tokenize, tag, and analyze natural language text using Apple's NaturalLanguage framework and translate between languages with the Translation framework. Use…

INSTALLATION
npx skills add https://github.com/dpearson2699/swift-ios-skills --skill natural-language
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

NaturalLanguage + Translation

Analyze natural language text for tokenization, part-of-speech tagging, named

entity recognition, sentiment analysis, language identification, and word/sentence

embeddings. Translate text between languages with the Translation framework.

Targets Swift 6.3 / iOS 26+.

This skill covers two related frameworks: NaturalLanguage (NLTokenizer, NLTagger, NLEmbedding) for on-device text analysis, and Translation (TranslationSession, LanguageAvailability) for language translation.

Contents

  • [Setup](#setup)
  • [Tokenization](#tokenization)
  • [Language Identification](#language-identification)
  • [Part-of-Speech Tagging](#part-of-speech-tagging)
  • [Named Entity Recognition](#named-entity-recognition)
  • [Sentiment Analysis](#sentiment-analysis)
  • [Text Embeddings](#text-embeddings)
  • [Translation](#translation)
  • [Common Mistakes](#common-mistakes)
  • [Review Checklist](#review-checklist)
  • [References](#references)

Setup

Import NaturalLanguage for text analysis and Translation for language

translation. No special entitlements or capabilities are required for

NaturalLanguage. Translation requires iOS 17.4+ / macOS 14.4+.

import NaturalLanguage

import Translation

NaturalLanguage classes (NLTokenizer, NLTagger) are not thread-safe.

Use each instance from one thread or dispatch queue at a time.

Tokenization

Segment text into words, sentences, or paragraphs with NLTokenizer.

import NaturalLanguage

func tokenizeWords(in text: String) -> [String] {

    let tokenizer = NLTokenizer(unit: .word)

    tokenizer.string = text

    let range = text.startIndex..<text.endIndex

    return tokenizer.tokens(for: range).map { String(text[$0]) }

}

Token Units

Unit

Description

.word

Individual words

.sentence

Sentences

.paragraph

Paragraphs

.document

Entire document

Enumerating with Attributes

Use enumerateTokens(in:using:) to detect numeric or emoji tokens.

let tokenizer = NLTokenizer(unit: .word)

tokenizer.string = text

tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, attributes in

    if attributes.contains(.numeric) {

        print("Number: \(text[range])")

    }

    return true // continue enumeration

}

Language Identification

Detect the dominant language of a string with NLLanguageRecognizer.

func detectLanguage(for text: String) -> NLLanguage? {

    NLLanguageRecognizer.dominantLanguage(for: text)

}

// Multiple hypotheses with confidence scores

func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] {

    let recognizer = NLLanguageRecognizer()

    recognizer.processString(text)

    return recognizer.languageHypotheses(withMaximum: max)

}

Constrain the recognizer to expected languages for better accuracy on short text.

let recognizer = NLLanguageRecognizer()

recognizer.languageConstraints = [.english, .french, .spanish]

recognizer.processString(text)

let detected = recognizer.dominantLanguage

Part-of-Speech Tagging

Identify nouns, verbs, adjectives, and other lexical classes with NLTagger.

func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] {

    let tagger = NLTagger(tagSchemes: [.lexicalClass])

    tagger.string = text

    var results: [(String, NLTag)] = []

    let range = text.startIndex..<text.endIndex

    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]

    tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in

        if let tag {

            results.append((String(text[tokenRange]), tag))

        }

        return true

    }

    return results

}

Common Tag Schemes

Scheme

Output

.lexicalClass

Part of speech (noun, verb, adjective)

.nameType

Named entity type (person, place, organization)

.nameTypeOrLexicalClass

Combined NER + POS

.lemma

Base form of a word

.language

Per-token language

.sentimentScore

Sentiment polarity score

Named Entity Recognition

Extract people, places, and organizations.

func extractEntities(from text: String) -> [(String, NLTag)] {

    let tagger = NLTagger(tagSchemes: [.nameType])

    tagger.string = text

    var entities: [(String, NLTag)] = []

    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

    tagger.enumerateTags(

        in: text.startIndex..<text.endIndex,

        unit: .word,

        scheme: .nameType,

        options: options

    ) { tag, tokenRange in

        if let tag, tag != .other {

            entities.append((String(text[tokenRange]), tag))

        }

        return true

    }

    return entities

}

// NLTag values: .personalName, .placeName, .organizationName

Sentiment Analysis

Score text sentiment from -1.0 (negative) to +1.0 (positive).

func sentimentScore(for text: String) -> Double? {

    let tagger = NLTagger(tagSchemes: [.sentimentScore])

    tagger.string = text

    let (tag, _) = tagger.tag(

        at: text.startIndex,

        unit: .paragraph,

        scheme: .sentimentScore

    )

    return tag.flatMap { Double($0.rawValue) }

}

Text Embeddings

Measure semantic similarity between words or sentences with NLEmbedding.

func wordSimilarity(_ word1: String, _ word2: String) -> Double? {

    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil }

    return embedding.distance(between: word1, and: word2, distanceType: .cosine)

}

func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] {

    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] }

    return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine)

}

Sentence embeddings compare entire sentences.

func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? {

    guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil }

    return embedding.distance(between: s1, and: s2, distanceType: .cosine)

}

Translation

System Translation Overlay

Show the built-in translation UI with .translationPresentation().

import SwiftUI

import Translation

struct TranslatableView: View {

    @State private var showTranslation = false

    let text = "Hello, how are you?"

    var body: some View {

        Button { showTranslation = true } label: {

            Text(text)

        }

        .buttonStyle(.plain)

        .translationPresentation(

            isPresented: $showTranslation,

            text: text

        )

    }

}

Programmatic Translation

Use .translationTask() for programmatic translations within a view context.

struct TranslatingView: View {

    @State private var translatedText = ""

    @State private var configuration: TranslationSession.Configuration?

    var body: some View {

        VStack {

            Text(translatedText)

            Button("Translate") {

                configuration = .init(source: Locale.Language(identifier: "en"),

                                      target: Locale.Language(identifier: "es"))

            }

        }

        .translationTask(configuration) { session in

            let response = try await session.translate("Hello, world!")

            translatedText = response.targetText

        }

    }

}

Batch Translation

Translate multiple strings in a single session.

.translationTask(configuration) { session in

    let requests = texts.enumerated().map { index, text in

        TranslationSession.Request(sourceText: text,

                                    clientIdentifier: "\(index)")

    }

    let responses = try await session.translations(from: requests)

    for response in responses {

        print("\(response.sourceText) -> \(response.targetText)")

    }

}

Checking Language Availability

let availability = LanguageAvailability()

let status = await availability.status(

    from: Locale.Language(identifier: "en"),

    to: Locale.Language(identifier: "ja")

)

switch status {

case .installed: break    // Ready to translate offline

case .supported: break    // Needs download

case .unsupported: break  // Language pair not available

}

Common Mistakes

DON'T: Share NLTagger/NLTokenizer across threads

These classes are not thread-safe and will produce incorrect results or crash.

// WRONG

let sharedTagger = NLTagger(tagSchemes: [.lexicalClass])

DispatchQueue.concurrentPerform(iterations: 10) { _ in

    sharedTagger.string = someText  // Data race

}

// CORRECT

await withTaskGroup(of: Void.self) { group in

    for _ in 0..<10 {

        group.addTask {

            let tagger = NLTagger(tagSchemes: [.lexicalClass])

            tagger.string = someText

            // process...

        }

    }

}

DON'T: Confuse NaturalLanguage with Core ML

NaturalLanguage provides built-in linguistic analysis. Use Core ML for custom

trained models. They complement each other via NLModel.

// WRONG: Trying to do NER with raw Core ML

let coreMLModel = try MLModel(contentsOf: modelURL)

// CORRECT: Use NLTagger for built-in NER

let tagger = NLTagger(tagSchemes: [.nameType])

// Or load a custom Core ML model via NLModel

let nlModel = try NLModel(mlModel: coreMLModel)

tagger.setModels([nlModel], forTagScheme: .nameType)

DON'T: Assume embeddings exist for all languages

Not all languages have word or sentence embeddings available on device.

// WRONG: Force unwrap

let embedding = NLEmbedding.wordEmbedding(for: .japanese)!

// CORRECT: Handle nil

guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else {

    // Embedding not available for this language

    return

}

DON'T: Create a new tagger per token

Creating and configuring a tagger is expensive. Reuse it for the same text.

// WRONG: New tagger per word

for word in words {

    let tagger = NLTagger(tagSchemes: [.lexicalClass])

    tagger.string = word

}

// CORRECT: Set string once, enumerate

let tagger = NLTagger(tagSchemes: [.lexicalClass])

tagger.string = fullText

tagger.enumerateTags(in: fullText.startIndex..<fullText.endIndex,

                     unit: .word, scheme: .lexicalClass, options: []) { tag, range in

    return true

}

DON'T: Ignore language hints for short text

Language detection on short strings (under ~20 characters) is unreliable.

Set constraints or hints to improve accuracy.

// WRONG: Detect language of a single word

let lang = NLLanguageRecognizer.dominantLanguage(for: "chat")  // French or English?

// CORRECT: Provide context

let recognizer = NLLanguageRecognizer()

recognizer.languageHints = [.english: 0.8, .french: 0.2]

recognizer.processString("chat")

Review Checklist

  • NLTokenizer and NLTagger instances used from a single thread
  • Tagger created once per text, not per token
  • Language detection uses constraints/hints for short text
  • NLEmbedding availability checked before use (returns nil if unavailable)
  • Translation LanguageAvailability checked before attempting translation
  • .translationTask() used within a SwiftUI view hierarchy
  • Batch translation uses clientIdentifier to match responses to requests
  • Sentiment scores handled as optional (may return nil for unsupported languages)
  • .joinNames option used with NER to keep multi-word names together
  • Custom ML models loaded via NLModel, not raw Core ML

References

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card