Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Models

Llama-3.3-70B-Instruct

Meta

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.10

Output tokens, 1M

$0.30

Tokens per sec

100

Quantization

fp8

Size

70B

128K context

chat

trivia

marketing

reasoning

Kimi-K2-Thinking

Moonshotai

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.45

Output tokens, 1M

$2.20

Tokens per sec

TBA

Quantization

fp8

Size

1T

131K context

JSON mode

reasoning

math

reasoning

gpt-oss-120b

OpenAI

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.04

Output tokens, 1M

$0.24

Tokens per sec

TBA

Quantization

fp8

Size

120B

131K context

JSON mode

code

math

reasoning

Enterprise-Ready Inference

Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.

gpt-oss-20b

OpenAI

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.03

Output tokens, 1M

$0.13

Tokens per sec

TBA

Quantization

fp8

Size

20B

131K context

JSON mode

code

math

reasoning

DeepSeek-V3-0324

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.20

Output tokens, 1M

$0.85

Tokens per sec

TBA

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

DeepSeek-V3.1

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.27

Output tokens, 1M

$0.80

Tokens per sec

TBA

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

DeepSeek-R1-0528

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.50

Output tokens, 1M

$2.00

Tokens per sec

TBA

Quantization

fp8

Size

685B

164K context

JSON mode

MoE

code

reasoning

Qwen3-Coder-30B-
A3B-Instruct

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.07

Output tokens, 1M

$0.25

Tokens per sec

TBA

Quantization

fp8

Size

30B

262K context

JSON mode

code

math

Qwen3-235B-
A22B-Instruct-2507

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.24

Output tokens, 1M

$2.35

Tokens per sec

TBA

Quantization

fp8

Size

235B

262K context

JSON mode

code

math

reasoning

Qwen2.5-VL-72B-
Instruct

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.20

Output tokens, 1M

$0.70

Tokens per sec

TBA

Quantization

fp8

Size

72B

128K context

JSON mode

code

math

GLM-4.6

Zai-Org

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.42

Output tokens, 1M

$1.75

Tokens per sec

TBA

Quantization

fp8

Size

357B

262K context

JSON mode

code

math

reasoning

Run any model on the fastest endpoints

Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.

Scale into a dedicated deployment anytime with a custom number of instances to get optimal throughput.

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'