Models

Llama-3.3-70B-Instruct
Meta
Mode
Inceptron Optimized
Region
Input tokens, 1M
$0.10
Output tokens, 1M
$0.30
Tokens per sec
100
Quantization
fp8
Size
70B
128K context
chat
trivia
marketing
reasoning

Kimi-K2-Instruct
Moonshotai
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
131K context
JSON mode
reasoning
math
reasoning
gpt-oss-120b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
120B
131K context
JSON mode
code
math
reasoning
Enterprise-grade inference
Deploy and scale models like Llama, Qwen, Kimi and DeepSeek with guaranteed uptime, zero-retention data flow, and usage-based pricing, no GPU wrangling required.
gpt-oss-20b
OpenAI
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
20B
131K context
JSON mode
code
math
reasoning

DeepSeek-V3-0324
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-V3.1
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
685B
128K context
JSON mode
MoE
code
math
reasoning

DeepSeek-R1-0528
DeepSeek
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
685B
164K context
JSON mode
MoE
code
reasoning

Qwen3-Coder-30B-
A3B-Instruct
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
30B
262K context
JSON mode
code
math

Qwen/Qwen3-235B-
A22B-Instruct-2507
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
235B
262K context
JSON mode
code
math
reasoning

Qwen2.5-VL-72B-
Instruct7B-fast
Qwen
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
72B
128K context
JSON mode
code
math

GLM-4.6
Zai-Org
Mode
Inceptron Optimized
Region
Input tokens, 1M
TBA
Output tokens, 1M
TBA
Tokens per sec
TBA
Quantization
fp8
Size
357B
262K context
JSON mode
code
math
reasoning
Run any model on the fastest endpoints
Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.
Scale into a dedicated deployment anytime with a custom number of instances to get optimal throughput.