Models

Llama-3.3-70B-Instruct
Meta
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.10
Output tokens, 1M
$0.30
Tokens per sec
100
Quantization
fp8
Size
70B
128K context
chat
trivia
marketing
reasoning

Kimi-K2-Thinking
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.45
Output tokens, 1M
$2.20
Tokens per sec
TBA
Quantization
fp8
Size
131K context
JSON mode
reasoning
math
reasoning
gpt-oss-120b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.04
Output tokens, 1M
$0.24
Tokens per sec
TBA
Quantization
fp8
Size
120B
131K context
JSON mode
code
math
reasoning
Enterprise-Ready Inference
Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.
gpt-oss-20b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.03
Output tokens, 1M
$0.13
Tokens per sec
TBA
Quantization
fp8
Size
20B
131K context
JSON mode
code
math
reasoning

DeepSeek-V3-0324
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.20
Output tokens, 1M
$0.85
Tokens per sec
TBA
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-V3.1
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.27
Output tokens, 1M
$0.80
Tokens per sec
TBA
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-R1-0528
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.50
Output tokens, 1M
$2.00
Tokens per sec
TBA
Quantization
fp8
Size
685B
164K context
JSON mode
MoE
code
reasoning

Qwen3-Coder-30B-
A3B-Instruct
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.07
Output tokens, 1M
$0.25
Tokens per sec
TBA
Quantization
fp8
Size
30B
262K context
JSON mode
code
math

Qwen3-235B-
A22B-Instruct-2507
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.24
Output tokens, 1M
$2.35
Tokens per sec
TBA
Quantization
fp8
Size
235B
262K context
JSON mode
code
math
reasoning

Qwen2.5-VL-72B-
Instruct
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.20
Output tokens, 1M
$0.70
Tokens per sec
TBA
Quantization
fp8
Size
72B
128K context
JSON mode
code
math

GLM-4.6
Zai-Org
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.42
Output tokens, 1M
$1.75
Tokens per sec
TBA
Quantization
fp8
Size
357B
262K context
JSON mode
code
math
reasoning
Run any model on the fastest endpoints
Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.
Scale into a dedicated deployment anytime with a custom number of instances to get optimal throughput.