SKILL.md
Apify SDK Integration
Add Apify Actor execution to an existing application. This skill covers the apify-client package for JS/TS and Python, plus the REST API for other languages.
When to Use This Skill
- Adding web scraping or automation to an existing app
- Calling Apify Actors programmatically from application code
- Building a product that uses Apify as a backend service
- Integrating Actor results into a data pipeline
Critical: Package Naming
**apify-client is the API client for calling** Actors from your app.
**apify is the SDK for building** Actors (wrong package for this use case).
Always install apify-client. Never install apify for integration work.
Prerequisites
The user needs an APIFY_TOKEN. Direct them to Console > Settings > Integrations at https://console.apify.com/settings/integrations to create one. If they don't have an account: https://console.apify.com/sign-up (free, no credit card).
Store the token securely — environment variable or secrets manager, never hardcoded.
Finding the Right Actor
Before writing integration code, find the Actor that fits the user's needs. Use the MCP tools if available:
search-actors— search the Apify Store by keyword
fetch-actor-details— get the Actor's input schema, output format, and pricing
Alternatively, browse https://apify.com/store. Append .md to any Actor's Store URL to get its docs in markdown.
JavaScript / TypeScript
Install
npm install apify-client
Synchronous Execution (wait for results)
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('apify/web-scraper').call({
startUrls: [{ url: 'https://example.com' }],
maxPagesPerCrawl: 10,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
.call() blocks until the Actor finishes. Use for short-running Actors (under a few minutes).
Asynchronous Execution (start and poll/retrieve later)
const run = await client.actor('apify/web-scraper').start({
startUrls: [{ url: 'https://example.com' }],
});
// Poll for completion
const finishedRun = await client.run(run.id).waitForFinish();
// Retrieve results
const { items } = await client.dataset(finishedRun.defaultDatasetId).listItems();
Use .start() + .waitForFinish() for long-running Actors or when you need the run ID immediately.
Retrieving Results
// Dataset items (structured data from pushData)
const { items } = await client.dataset(run.defaultDatasetId).listItems({
limit: 100,
offset: 0,
});
// Key-value store (files, screenshots, etc.)
const record = await client.keyValueStore(run.defaultKeyValueStoreId).getRecord('OUTPUT');
Error Handling
try {
const run = await client.actor('apify/web-scraper').call(input);
if (run.status !== 'SUCCEEDED') {
const log = await client.log(run.id).get();
throw new Error(`Actor failed with status ${run.status}: ${log}`);
}
const { items } = await client.dataset(run.defaultDatasetId).listItems();
} catch (error) {
if (error.message?.includes('not found')) {
// Actor ID is wrong or Actor was deleted
} else if (error.statusCode === 401) {
// Invalid or missing APIFY_TOKEN
}
throw error;
}
Python
Install
pip install apify-client
Synchronous Execution
from apify_client import ApifyClient
import os
client = ApifyClient(token=os.environ['APIFY_TOKEN'])
run = client.actor('apify/web-scraper').call(run_input={
'startUrls': [{'url': 'https://example.com'}],
'maxPagesPerCrawl': 10,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
Asynchronous Execution
run = client.actor('apify/web-scraper').start(run_input={
'startUrls': [{'url': 'https://example.com'}],
})
# Poll for completion
finished_run = client.run(run['id']).wait_for_finish()
items = client.dataset(finished_run['defaultDatasetId']).list_items().items
Async Client (asyncio)
from apify_client import ApifyClientAsync
client = ApifyClientAsync(token=os.environ['APIFY_TOKEN'])
run = await client.actor('apify/web-scraper').call(run_input={
'startUrls': [{'url': 'https://example.com'}],
})
items = (await client.dataset(run['defaultDatasetId']).list_items()).items
REST API (Any Language)
For languages without an official client, use the REST API directly.
Start a Run
POST https://api.apify.com/v2/acts/{actorId}/runs
Authorization: Bearer <APIFY_TOKEN>
Content-Type: application/json
{ "startUrls": [{ "url": "https://example.com" }] }
Get Run Status
GET https://api.apify.com/v2/acts/{actorId}/runs/{runId}
Authorization: Bearer <APIFY_TOKEN>
Get Dataset Items
GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json
Authorization: Bearer <APIFY_TOKEN>
Full API reference: https://docs.apify.com/api/v2
Best Practices
- Set timeouts: Pass
timeoutSecsin the Actor input or usewaitSecson.call()to avoid indefinite waits.
- Paginate large datasets: Use
limitandoffsetwhen retrieving dataset items. Default limit is 250K items.
- Reuse clients: Create one
ApifyClientinstance and reuse it across calls.
- Handle Actor-specific input: Every Actor has its own input schema. Use
fetch-actor-detailsMCP tool or append.mdto the Actor's Store URL to get the schema before constructing input.
Documentation
- Apify API client for JS: https://docs.apify.com/api/client/js
- Apify API client for Python: https://docs.apify.com/api/client/python
- REST API reference: https://docs.apify.com/api/v2
- Apify docs (LLM-friendly): https://docs.apify.com/llms.txt
- Apify docs (full): https://docs.apify.com/llms-full.txt
If the Apify MCP server is available, use search-apify-docs and fetch-apify-docs tools for contextual documentation lookups during development.