Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Models

Llama-3.3-70B-Instruct

Enterprise-grade inference

Deploy and scale models like Llama, Qwen, Kimi and DeepSeek with guaranteed uptime, zero-retention data flow, and usage-based pricing, no GPU wrangling required.

Talk to our team

gpt-oss-20b

OpenAI

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

20B

131K context

JSON mode

code

math

reasoning

Go to playground

DeepSeek-V3-0324

DeepSeek

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

Go to playground

DeepSeek-V3.1

DeepSeek

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

685B

128K context

JSON mode

MoE

code

math

reasoning

Go to playground

DeepSeek-R1-0528

DeepSeek

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

685B

164K context

JSON mode

MoE

code

reasoning

Go to playground

Qwen3-Coder-30B-
A3B-Instruct

Qwen

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

30B

262K context

JSON mode

code

math

Go to playground

Qwen/Qwen3-235B-
A22B-Instruct-2507

Qwen

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

235B

262K context

JSON mode

code

math

reasoning

Go to playground

Qwen2.5-VL-72B-
Instruct7B-fast

Qwen

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

72B

128K context

JSON mode

code

math

Go to playground

GLM-4.6

Zai-Org

Description

License

Mode

Inceptron Optimized

Region

Input tokens, 1M

TBA

Output tokens, 1M

TBA

Tokens per sec

TBA

Quantization

fp8

Size

357B

262K context

JSON mode

code

math

reasoning

Go to playground

Run any model on the fastest endpoints

Use our API to deploy any open-source model on the fastest inference stack available with optimal cost efficiency.

Scale into a dedicated deployment anytime with a custom number of instances to get optimal throughput.

Start building

Curl

Python

Typescript

curl -X POST "https://api.inceptron.io/v1/chat/completions" \
  -H "Authorization: Bearer $INCEPTRON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
 "model": "meta-llama/Llama-Vision-Free",
 "messages": [{"role": "user", "content": "What are some fun things to do in New York?"}]
  }'

Curl

Python

Typescript

curl -X POST "https://api.inceptron.io/v1/chat/completions" \
  -H "Authorization: Bearer $INCEPTRON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
 "model": "meta-llama/Llama-Vision-Free",
 "messages": [{"role": "user", "content": "What are some fun things to do in New York?"}]
  }'

Curl

Python

Typescript

curl -X POST "https://api.inceptron.io/v1/chat/completions" \
  -H "Authorization: Bearer $INCEPTRON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
 "model": "meta-llama/Llama-Vision-Free",
 "messages": [{"role": "user", "content": "What are some fun things to do in New York?"}]
  }'