Staxly
ai-api

Groq

Fastest LLM inference — LPU-powered (300-1000+ tokens/sec)

LPU (Language Processing Unit) inference infrastructure. Hosts Llama, Mixtral, gpt-oss, Whisper. OpenAI-compatible API. Blazing-fast: 300-1000+ tokens/sec.

Groq websiteDocs ↗

Pricing

TierPriceNotes
Free TierFreeGenerous free RPM / TPM by model. Great for dev + small apps.
On-Demand (paid)FreePay-as-you-go per token. OpenAI-compatible API, no infrastructure to manage.
Developer TierFreeHigher rate limits for production apps.
EnterpriseCustomCustom. Dedicated capacity, SLA, on-prem option.

Limits

TierMetricValueNotes
batch api discount50% offBatch API
cached input discount50% off cached inputInput caching
function callingsupported on most modelsFunction calling
gpt-oss-20b input$0.075/M tokensgpt-oss 20B input
gpt-oss-20b output$0.30/M tokensgpt-oss 20B output
llama-3.1-8b-instant input$0.05/M tokensLlama 3.1 8B input
llama-3.1-8b-instant output$0.08/M tokensLlama 3.1 8B output
llama-3.3-70b input$0.59/M tokensLlama 3.3 70B input
llama-3.3-70b output$0.79/M tokensLlama 3.3 70B output
openai api compatyes — swap base_url to https://api.groq.com/openai/v1OpenAI SDK compatibility
speed gptoss20b tps952 tokens/secgpt-oss 20B speed (high)
speed llama8b tps640 tokens/secLlama 3.1 8B speed
streamingSSE streaming supportedStreaming responses
whisper-large-v3$0.111/hour audioWhisper transcription

Features

Developer interfaces

SlugNameKindVersion
rest-apiGroq API (OpenAI-compat)restv1
sdk-pythongroq-pythonsdk1.x
sdk-nodegroq-sdk (Node)sdk0.x

Compare Groq with

Staxly is an independent catalog of developer platforms. Outbound links to Groq are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.