SKILL.md
$27
- Create a prediction, store its id from the response, and poll until completion.
- Set a
Prefer: waitheader when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds.
- Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.
Guidelines
- Use the
POST /v1/predictionsendpoint, as it supports both official and community models.
- Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
- Validate input parameters against schema constraints (
minimum,maximum,enumvalues). Don't generate values that violate them.
- When unsure about a parameter value, use the model's default example or omit the optional parameter.
- Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
- Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
- Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
- Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
- Webhooks are a good mechanism for receiving and storing prediction output.
Predictions
- A prediction goes through these states:
starting->processing->succeeded/failed/canceled.
- Official models use
owner/nameformat. Community models requireowner/name:version_id.
- The
POST /v1/predictionsendpoint handles both.
Webhooks
- Set
webhookto an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes.
- Filter events with
webhook_events_filter:start,output,logs,completed.
- Validate webhook signatures using the
Webhook-ID,Webhook-Timestamp, andWebhook-Signatureheaders. Get the signing secret fromGET /v1/webhooks/default/secret.
Prediction lifetime
- Set
lifetimeto auto-cancel predictions that run too long (e.g.30s,5m,1h). Measured from creation time.
Streaming
- Language models that support streaming include a
streamURL in the response. Use SSE to receive incremental output.
File handling
- Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
- Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.
Multi-model workflows
- Chain models by passing output URLs as file inputs to the next model.
- Start all independent predictions in parallel, then collect results.
- Output URLs are valid for 1 hour, which is enough for pipeline steps.