ai-api
Together AI
Open-source LLM infra — inference + fine-tuning + dedicated GPUs + image/video/audio
Full-stack infra for open-source AI: serverless inference (200+ models), dedicated GPU endpoints, fine-tuning, image/video/audio models, Code Sandbox.
Pricing
| Tier | Price | Notes |
|---|---|---|
| Pay-as-you-go | Free | Per-token pricing for serverless inference. No minimum. |
| Dedicated Endpoints | Free | Single-tenant GPU endpoints billed hourly. |
| Batch API (50% off) | Free | 50% discount for async batch processing on most serverless models. |
| Reserved GPU Clusters | Free | 6+ day commitments with discounted reserved rates. |
| Enterprise | Custom | Custom. Private deployments, VPC, SLAs, dedicated support. |
Limits
| Tier | Metric | Value | Notes |
|---|---|---|---|
| — | audio whisper | $0.0015/min (Whisper Large v3) | Whisper |
| — | code interpreter | $0.03 per 60-min session | Code Interpreter |
| — | deepseek r1 | $3/M input + $7/M output | DeepSeek-R1 |
| — | fine tune 70 100b | $2.90-$8.00 per 1M tokens | Fine-tune large models |
| — | fine tune up to 16b | $0.48-$1.35 per 1M tokens | Fine-tune small models |
| — | gemma 3n e4b | $0.06/M input + $0.12/M output | Gemma 3n E4B (cheapest) |
| — | gemma 4 31b | $0.20/M input + $0.50/M output | Gemma 4 31B |
| — | glm 5 1 | $1.40/M input + $4.40/M output | GLM-5.1 |
| — | gpu b200 single | $9.95/hr (1x B200 180GB) | B200 dedicated |
| — | gpu h100 single | $3.99/hr (1x H100 80GB) | H100 dedicated |
| — | image flux pro | $0.03/image (FLUX.2 pro) | FLUX.2 pro |
| — | image flux schnell | $0.0027/image | FLUX.1 schnell |
| — | llama 3 3 70b | $0.88/M I/O | Llama 3.3 70B |
| — | qwen3 5 397b | $0.60/M input + $3.60/M output | Qwen3.5 397B |
| — | qwen3 5 9b | $0.10/M input + $0.15/M output | Qwen3.5 9B (budget) |
| — | storage rate | $0.16/GiB/month | Storage |
| — | tts cartesia | $65/1M characters (Cartesia Sonic-3) | TTS |
| — | video veo3 | $1.60/video (Google Veo 3.0) | Veo 3.0 |
Features
- Audio (ASR + TTS) — Whisper Large v3 + Cartesia Sonic-3.
- Batch API — 50% discount for async processing.
- Code Interpreter — LLM with integrated code execution.
- Code Sandbox — Secure Python execution environment.
- Dedicated Endpoints — Single-tenant GPU endpoints for consistent latency.
- Embeddings — BGE + nomic + mxbai embedding models.
- Fine-Tuning — LoRA + full fine-tune + DPO on Llama, Qwen, Mistral.
- Image Generation — FLUX.2, SD3, Ideogram, etc.
- OpenAI-Compat API — Drop-in OpenAI SDK replacement.
- Private Deploy — Dedicated tenant + VPC.
- Reranker — Rerank model for RAG retrieval refinement.
- Reserved Clusters — Discounted GPU clusters for committed use.
- Serverless Inference — 200+ open models. OpenAI-compatible API.
- Video Generation — Veo 3.0, Kling 2.1, Vidu 2.0.
Developer interfaces
| Slug | Name | Kind | Version |
|---|---|---|---|
| code-sandbox | Code Sandbox / Interpreter | rest | v1 |
| dedicated-endpoints | Dedicated Endpoints | rest | v1 |
| cli | Together CLI | cli | 1.x |
| sdk-node | together-js | sdk | 0.x |
| sdk-python | together-python | sdk | 1.x |
| rest-api | Together REST API (OpenAI-compat) | rest | v1 |
Compare Together AI with
ai-api
Together AI vs Anthropic API
Side-by-side breakdown.
ai-api
Together AI vs AssemblyAI
Side-by-side breakdown.
ai-api
Together AI vs Deepgram
Side-by-side breakdown.
ai-api
Together AI vs ElevenLabs
Side-by-side breakdown.
ai-api
Together AI vs Google Gemini API
Side-by-side breakdown.
ai-api
Together AI vs Groq
Side-by-side breakdown.
ai-api
Together AI vs OpenAI API
Side-by-side breakdown.
ai-api
Together AI vs Replicate
Side-by-side breakdown.
Staxly is an independent catalog of developer platforms. Outbound links to Together AI are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.