Staxly
ai-api

Replicate

Run and fine-tune AI models in the cloud — pay-per-second GPU

Run 1000s of open-source AI models (FLUX, Stable Diffusion, LLMs) via API. Per-second GPU billing. Cog framework for packaging your own models. Deploy + fine-tune.

Replicate websiteDocs ↗

Pricing

TierPriceNotes
Pay-as-you-goFreePer-second GPU billing. No minimum. Public models billed by processing time or tokens.
EnterpriseCustomCustom. Dedicated capacity, private deployments, SOC2, HIPAA on request.

Limits

TierMetricValueNotes
cpu small$0.000025/sec (1 vCPU, 2GB)CPU small
cpu standard$0.000100/sec (4 vCPU, 8GB)CPU standard
fast boot fine tunesOnly active processing time billedFine-tune billing
gpu a100 80gb$0.001400/sec (~$5.04/hr)Nvidia A100 80GB
gpu h100 80gb$0.001525/sec (~$5.49/hr)Nvidia H100 80GB
gpu l40s 48gb$0.000975/sec (~$3.51/hr)Nvidia L40S
gpu t4$0.000225/sec (~$0.81/hr)Nvidia T4
model claude sonnet$3/M input + $15/M output tokens (Claude 3.7 Sonnet)Token-billed example
model flux pro$0.04 per output image (FLUX 1.1 Pro)Image model example
private model billingDedicated hardware billed for setup + idle + active timePrivate model billing

Features

Developer interfaces

SlugNameKindVersion
cogCog (package models)cli0.x
sdk-goreplicate-gosdk1.x
mcpReplicate MCPmcp
sdk-nodereplicate (Node)sdk1.x
sdk-pythonreplicate-pythonsdk1.x
rest-apiReplicate REST APIrestv1
webhooksWebhooksother

Compare Replicate with

Staxly is an independent catalog of developer platforms. Outbound links to Replicate are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.