ai-api
Replicate
Run and fine-tune AI models in the cloud — pay-per-second GPU
Run 1000s of open-source AI models (FLUX, Stable Diffusion, LLMs) via API. Per-second GPU billing. Cog framework for packaging your own models. Deploy + fine-tune.
Pricing
| Tier | Price | Notes |
|---|---|---|
| Pay-as-you-go | Free | Per-second GPU billing. No minimum. Public models billed by processing time or tokens. |
| Enterprise | Custom | Custom. Dedicated capacity, private deployments, SOC2, HIPAA on request. |
Limits
| Tier | Metric | Value | Notes |
|---|---|---|---|
| — | cpu small | $0.000025/sec (1 vCPU, 2GB) | CPU small |
| — | cpu standard | $0.000100/sec (4 vCPU, 8GB) | CPU standard |
| — | fast boot fine tunes | Only active processing time billed | Fine-tune billing |
| — | gpu a100 80gb | $0.001400/sec (~$5.04/hr) | Nvidia A100 80GB |
| — | gpu h100 80gb | $0.001525/sec (~$5.49/hr) | Nvidia H100 80GB |
| — | gpu l40s 48gb | $0.000975/sec (~$3.51/hr) | Nvidia L40S |
| — | gpu t4 | $0.000225/sec (~$0.81/hr) | Nvidia T4 |
| — | model claude sonnet | $3/M input + $15/M output tokens (Claude 3.7 Sonnet) | Token-billed example |
| — | model flux pro | $0.04 per output image (FLUX 1.1 Pro) | Image model example |
| — | private model billing | Dedicated hardware billed for setup + idle + active time | Private model billing |
Features
- 10k+ Models — Public catalog of image, video, audio, LLM, embedding, speech models. · docs
- Batch Predictions — Parallel batch execution.
- Cog — OSS tool to containerize ML models. Standard for Replicate. · docs
- Deployments — Private model endpoints with dedicated GPUs.
- File Storage — Temporary output file hosting.
- Fine-Tuning — Fine-tune FLUX, SDXL, Llama 2/3 with your data.
- Per-Second Billing — Pay only while model runs. No idle cost for public models.
- Playground — Interactive UI for every public model.
- Predictions API — Async + sync + streaming predictions.
- Streaming Outputs — SSE streaming for LLMs + audio.
- Webhooks — Notify when predictions complete.
Developer interfaces
| Slug | Name | Kind | Version |
|---|---|---|---|
| cog | Cog (package models) | cli | 0.x |
| sdk-go | replicate-go | sdk | 1.x |
| mcp | Replicate MCP | mcp | — |
| sdk-node | replicate (Node) | sdk | 1.x |
| sdk-python | replicate-python | sdk | 1.x |
| rest-api | Replicate REST API | rest | v1 |
| webhooks | Webhooks | other | — |
Compare Replicate with
ai-api
Replicate vs Anthropic API
Side-by-side breakdown.
ai-api
Replicate vs AssemblyAI
Side-by-side breakdown.
ai-api
Replicate vs Deepgram
Side-by-side breakdown.
ai-api
Replicate vs ElevenLabs
Side-by-side breakdown.
ai-api
Replicate vs Google Gemini API
Side-by-side breakdown.
ai-api
Replicate vs Groq
Side-by-side breakdown.
ai-api
Replicate vs OpenAI API
Side-by-side breakdown.
ai-api
Replicate vs Together AI
Side-by-side breakdown.
Staxly is an independent catalog of developer platforms. Outbound links to Replicate are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.