Models

Llama-3.3-70B-Instruct
Meta
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.10
Output tokens, 1M
$0.30
Tokens per sec
100
Quantization
fp8
Size
70B
128K context
chat
trivia
marketing
reasoning

Kimi-K2-Thinking
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.45
Output tokens, 1M
$2.20
Tokens per sec
47
Quantization
fp8
Size
131K context
JSON mode
reasoning
math
reasoning
gpt-oss-120b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.04
Output tokens, 1M
$0.24
Tokens per sec
40
Quantization
fp8
Size
120B
131K context
JSON mode
code
math
reasoning
Enterprise-Ready Inference
Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.
gpt-oss-20b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.03
Output tokens, 1M
$0.13
Tokens per sec
65
Quantization
fp8
Size
20B
131K context
JSON mode
code
math
reasoning

DeepSeek-V3-0324
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.20
Output tokens, 1M
$0.85
Tokens per sec
30
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-V3.1
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.27
Output tokens, 1M
$0.80
Tokens per sec
30
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-R1-0528
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.50
Output tokens, 1M
$2.00
Tokens per sec
20
Quantization
fp8
Size
685B
164K context
JSON mode
MoE
code
reasoning

Qwen3-Coder-30B-
A3B-Instruct
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.07
Output tokens, 1M
$0.25
Tokens per sec
60
Quantization
fp8
Size
30B
262K context
JSON mode
code
math

Qwen3-235B-
A22B-Instruct-2507
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.071
Output tokens, 1M
$0.463
Tokens per sec
30
Quantization
fp8
Size
235B
262K context
JSON mode
code
math
reasoning

Qwen2.5-VL-72B-
Instruct
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.20
Output tokens, 1M
$0.70
Tokens per sec
20
Quantization
fp8
Size
72B
128K context
JSON mode
code
math

GLM-4.6
Zai-Org
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.42
Output tokens, 1M
$1.75
Tokens per sec
30
Quantization
fp8
Size
357B
262K context
JSON mode
code
math
reasoning
Run any model on the fastest endpoints
Use our API to deploy any model on one of the most cost-efficient inference stacks available.
Scale seamlessly to a dedicated deployment at any time for optimal throughput.