ai-api

Together AI

Open-source LLM infra — inference + fine-tuning + dedicated GPUs + image/video/audio

Full-stack infra for open-source AI: serverless inference (200+ models), dedicated GPU endpoints, fine-tuning, image/video/audio models, Code Sandbox.

Together AI website ↗Docs ↗

Pricing

Tier	Price	Notes
Pay-as-you-go	Free	Per-token pricing for serverless inference. No minimum.
Dedicated Endpoints	Free	Single-tenant GPU endpoints billed hourly.
Batch API (50% off)	Free	50% discount for async batch processing on most serverless models.
Reserved GPU Clusters	Free	6+ day commitments with discounted reserved rates.
Enterprise	Custom	Custom. Private deployments, VPC, SLAs, dedicated support.

Limits

Tier	Metric	Value	Notes
—	audio whisper	$0.0015/min (Whisper Large v3)	Whisper
—	code interpreter	$0.03 per 60-min session	Code Interpreter
—	deepseek r1	$3/M input + $7/M output	DeepSeek-R1
—	fine tune 70 100b	$2.90-$8.00 per 1M tokens	Fine-tune large models
—	fine tune up to 16b	$0.48-$1.35 per 1M tokens	Fine-tune small models
—	gemma 3n e4b	$0.06/M input + $0.12/M output	Gemma 3n E4B (cheapest)
—	gemma 4 31b	$0.20/M input + $0.50/M output	Gemma 4 31B
—	glm 5 1	$1.40/M input + $4.40/M output	GLM-5.1
—	gpu b200 single	$9.95/hr (1x B200 180GB)	B200 dedicated
—	gpu h100 single	$3.99/hr (1x H100 80GB)	H100 dedicated
—	image flux pro	$0.03/image (FLUX.2 pro)	FLUX.2 pro
—	image flux schnell	$0.0027/image	FLUX.1 schnell
—	llama 3 3 70b	$0.88/M I/O	Llama 3.3 70B
—	qwen3 5 397b	$0.60/M input + $3.60/M output	Qwen3.5 397B
—	qwen3 5 9b	$0.10/M input + $0.15/M output	Qwen3.5 9B (budget)
—	storage rate	$0.16/GiB/month	Storage
—	tts cartesia	$65/1M characters (Cartesia Sonic-3)	TTS
—	video veo3	$1.60/video (Google Veo 3.0)	Veo 3.0

Features

Audio (ASR + TTS) — Whisper Large v3 + Cartesia Sonic-3.
Batch API — 50% discount for async processing.
Code Interpreter — LLM with integrated code execution.
Code Sandbox — Secure Python execution environment.
Dedicated Endpoints — Single-tenant GPU endpoints for consistent latency.
Embeddings — BGE + nomic + mxbai embedding models.
Fine-Tuning — LoRA + full fine-tune + DPO on Llama, Qwen, Mistral.
Image Generation — FLUX.2, SD3, Ideogram, etc.
OpenAI-Compat API — Drop-in OpenAI SDK replacement.
Private Deploy — Dedicated tenant + VPC.
Reranker — Rerank model for RAG retrieval refinement.
Reserved Clusters — Discounted GPU clusters for committed use.
Serverless Inference — 200+ open models. OpenAI-compatible API.
Video Generation — Veo 3.0, Kling 2.1, Vidu 2.0.

Developer interfaces

Slug	Name	Kind	Version
code-sandbox	Code Sandbox / Interpreter	rest	v1
dedicated-endpoints	Dedicated Endpoints	rest	v1
cli	Together CLI	cli	1.x
sdk-node	together-js	sdk	0.x
sdk-python	together-python	sdk	1.x
rest-api	Together REST API (OpenAI-compat)	rest	v1

Compare Together AI with

ai-api

Together AI vs Anthropic API

Side-by-side breakdown.

ai-api

Together AI vs AssemblyAI

Side-by-side breakdown.

ai-api

Together AI vs Deepgram

Side-by-side breakdown.

ai-api

Together AI vs ElevenLabs

Side-by-side breakdown.

ai-api

Together AI vs Google Gemini API

Side-by-side breakdown.

ai-api

Together AI vs Groq

Side-by-side breakdown.

ai-api

Together AI vs OpenAI API

Side-by-side breakdown.

ai-api

Together AI vs Replicate

Side-by-side breakdown.

Staxly is an independent catalog of developer platforms. Outbound links to Together AI are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.