ai-api

Replicate

Run and fine-tune AI models in the cloud — pay-per-second GPU

Run 1000s of open-source AI models (FLUX, Stable Diffusion, LLMs) via API. Per-second GPU billing. Cog framework for packaging your own models. Deploy + fine-tune.

Replicate website ↗Docs ↗

Pricing

Tier	Price	Notes
Pay-as-you-go	Free	Per-second GPU billing. No minimum. Public models billed by processing time or tokens.
Enterprise	Custom	Custom. Dedicated capacity, private deployments, SOC2, HIPAA on request.

Limits

Tier	Metric	Value	Notes
—	cpu small	$0.000025/sec (1 vCPU, 2GB)	CPU small
—	cpu standard	$0.000100/sec (4 vCPU, 8GB)	CPU standard
—	fast boot fine tunes	Only active processing time billed	Fine-tune billing
—	gpu a100 80gb	$0.001400/sec (~$5.04/hr)	Nvidia A100 80GB
—	gpu h100 80gb	$0.001525/sec (~$5.49/hr)	Nvidia H100 80GB
—	gpu l40s 48gb	$0.000975/sec (~$3.51/hr)	Nvidia L40S
—	gpu t4	$0.000225/sec (~$0.81/hr)	Nvidia T4
—	model claude sonnet	$3/M input + $15/M output tokens (Claude 3.7 Sonnet)	Token-billed example
—	model flux pro	$0.04 per output image (FLUX 1.1 Pro)	Image model example
—	private model billing	Dedicated hardware billed for setup + idle + active time	Private model billing

Features

10k+ Models — Public catalog of image, video, audio, LLM, embedding, speech models. · docs
Batch Predictions — Parallel batch execution.
Cog — OSS tool to containerize ML models. Standard for Replicate. · docs
Deployments — Private model endpoints with dedicated GPUs.
File Storage — Temporary output file hosting.
Fine-Tuning — Fine-tune FLUX, SDXL, Llama 2/3 with your data.
Per-Second Billing — Pay only while model runs. No idle cost for public models.
Playground — Interactive UI for every public model.
Predictions API — Async + sync + streaming predictions.
Streaming Outputs — SSE streaming for LLMs + audio.
Webhooks — Notify when predictions complete.

Developer interfaces

Slug	Name	Kind	Version
cog	Cog (package models)	cli	0.x
sdk-go	replicate-go	sdk	1.x
mcp	Replicate MCP	mcp	—
sdk-node	replicate (Node)	sdk	1.x
sdk-python	replicate-python	sdk	1.x
rest-api	Replicate REST API	rest	v1
webhooks	Webhooks	other	—

Compare Replicate with

ai-api

Replicate vs Anthropic API

Side-by-side breakdown.

ai-api

Replicate vs AssemblyAI

Side-by-side breakdown.

ai-api

Replicate vs Deepgram

Side-by-side breakdown.

ai-api

Replicate vs ElevenLabs

Side-by-side breakdown.

ai-api

Replicate vs Google Gemini API

Side-by-side breakdown.

ai-api

Replicate vs Groq

Side-by-side breakdown.

ai-api

Replicate vs OpenAI API

Side-by-side breakdown.

ai-api

Replicate vs Together AI

Side-by-side breakdown.

Staxly is an independent catalog of developer platforms. Outbound links to Replicate are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.