Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Models

Llama-3.3-70B-Instruct

Meta

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.10

Output tokens, 1M

$0.30

Tokens per sec

100

Quantization

fp8

Size

70B

128K context

chat

trivia

marketing

reasoning

Kimi-K2-Thinking

Moonshotai

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.45

Output tokens, 1M

$2.20

Tokens per sec

47

Quantization

fp8

Size

1T

131K context

JSON mode

reasoning

math

reasoning

gpt-oss-120b

OpenAI

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.04

Output tokens, 1M

$0.24

Tokens per sec

40

Quantization

fp8

Size

120B

131K context

JSON mode

code

math

reasoning

Enterprise-Ready Inference

Run and scale Llama, Qwen, Kimi, and DeepSeek with SLA-backed uptime, zero-retention data handling, and pay-as-you-go pricing—no GPU ops.

gpt-oss-20b

OpenAI

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.03

Output tokens, 1M

$0.13

Tokens per sec

65

Quantization

fp8

Size

20B

131K context

JSON mode

code

math

reasoning

DeepSeek-V3-0324

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.20

Output tokens, 1M

$0.85

Tokens per sec

30

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

DeepSeek-V3.1

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.27

Output tokens, 1M

$0.80

Tokens per sec

30

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

DeepSeek-R1-0528

DeepSeek

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.50

Output tokens, 1M

$2.00

Tokens per sec

20

Quantization

fp8

Size

685B

164K context

JSON mode

MoE

code

reasoning

Qwen3-Coder-30B-
A3B-Instruct

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.07

Output tokens, 1M

$0.25

Tokens per sec

60

Quantization

fp8

Size

30B

262K context

JSON mode

code

math

Qwen3-235B-
A22B-Instruct-2507

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.071

Output tokens, 1M

$0.463

Tokens per sec

30

Quantization

fp8

Size

235B

262K context

JSON mode

code

math

reasoning

Qwen2.5-VL-72B-
Instruct

Qwen

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.20

Output tokens, 1M

$0.70

Tokens per sec

20

Quantization

fp8

Size

72B

128K context

JSON mode

code

math

GLM-4.6

Zai-Org

Mode

Inceptron Optimized

Region

Input tokens, 1M

$0.42

Output tokens, 1M

$1.75

Tokens per sec

30

Quantization

fp8

Size

357B

262K context

JSON mode

code

math

reasoning

Run any model on the fastest endpoints

Use our API to deploy any model on one of the most cost-efficient inference stacks available.

Scale seamlessly to a dedicated deployment at any time for optimal throughput.



Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'

Curl

Python

JavaScript

curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "How many moons are there in the Solar System?"
    }
  ]
}'