Models

GLM-5.2
Zai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$1.20
Output tokens, 1M
$4.20
Cache read
$0.26
Tokens per sec
50
Quantization
fp4
Size
754B
1M context
text
code
tool-calling
reasoning

Kimi-K2.7 Code
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.75
Output tokens, 1M
$3.50
Cache read
$0.20
Tokens per sec
57
Quantization
Int4
Size
262K context
multimodal
tool-calling
reasoning

Kimi-K2.6
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.73
Output tokens, 1M
$3.50
Cache read
$0.25
Tokens per sec
47
Quantization
Int4
Size
1T
262K context
multimodal
tool-calling
reasoning
Enterprise-Ready Inference
Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.

MiniMax-M2.5
Minimax
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.15
Output tokens, 1M
$0.90
Cache read
$0.05
Tokens per sec
40
Quantization
fp8
Size
230B
196K context
text
code
tool-calling
reasoning
Run any model on the fastest endpoints
Use our API to deploy any model on one of the most cost-efficient inference stacks available.
Scale seamlessly to a dedicated deployment at any time for optimal throughput.