parallel-data-enrichment

Name: parallel-data-enrichment
Author: parallel-web

Bulk enrichment of company, people, or product data with web-sourced fields like CEO names, funding, and contact info. Accepts inline JSON data or CSV files; outputs enriched results to CSV Runs asynchronously with progress tracking via monitoring URL and polling commands Requires parallel-cli tool and internet access; handles large datasets with configurable timeouts Supports flexible field requests through natural language intent descriptions (e.g., "CEO name and founding year")

INSTALLATION

npx skills add https://github.com/parallel-web/parallel-agent-skills --skill parallel-data-enrichment

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2a

Step 1: Start the enrichment

Use ONE of these command patterns (substitute user's actual data):

For inline data:

parallel-cli enrich run --data '[{"company": "Google"}, {"company": "Microsoft"}]' --intent "CEO name and founding year" --target "output.csv" --no-wait --json

For CSV file:

parallel-cli enrich run --source-type csv --source "input.csv" --target "output.csv" --source-columns '[{"name": "company", "description": "Company name"}]' --intent "CEO name and founding year" --no-wait --json

If this is a follow-up to a previous research task and you have its interaction_id, add context chaining:

parallel-cli enrich run --data '...' --intent "..." --target "output.csv" --no-wait --json --previous-interaction-id "$INTERACTION_ID"

The enrichment will run with the full context of that prior research — so you can enrich entities discovered earlier without restating what was already found. Note: enrichment does not itself produce a new interaction_id, so you cannot chain a further follow-up off of an enrichment.

IMPORTANT: Always include --no-wait so the command returns immediately instead of blocking.

Parse the --json output to extract taskgroup_id and url. The output is {taskgroup_id, url, num_runs} — there is no interaction_id field, do not look for one. Immediately tell the user:

Enrichment has been kicked off

The monitoring URL where they can track progress

Tell them they can background the polling step to continue working while it runs.

Step 2: Poll for results

Pick a concrete output path (e.g., /tmp/enrichment-acme.json). Note: the file is JSON regardless of the extension you choose — it's an array of {input, output} objects, not a CSV. Name it .json to avoid confusing yourself or the user.

parallel-cli enrich poll "$TASKGROUP_ID" --timeout 540 --output "/tmp/enrichment-<descriptive-name>.json"

Important:

Use --timeout 540 (9 minutes) to stay within tool execution limits

The --target from step 1 is unused in --no-wait mode — only --output here determines where results are saved, and the file is always JSON

If the poll times out

Enrichment of large datasets can take longer than 9 minutes. If the poll exits without completing:

Tell the user the enrichment is still running server-side

Re-run the same parallel-cli enrich poll command to continue waiting

Response format

After step 1: Share the monitoring URL (for tracking progress).

After step 2:

Report number of rows enriched

Preview first few rows from the output file (it's a JSON array of {input, output} objects)

Tell the user the full path to the output file

Do NOT re-share the monitoring URL after completion — the results are in the output file.

Setup

Requires parallel-cli (installed and authenticated). If parallel-cli --version fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.