Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →
Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →
Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →
Pricing
Pricing that scales with you
Only pay for what you run. Simple, transparent, and built for real usage. No hidden fees.
Compute costs
Dedicated deployments
Nvidia H100 80 GB
$3 / h
Nvidia H200 141 GB
$5 / h
Nvidia B200 180 GB
$8 / h
Serverless inference
Llama 3.3 70B Instruct
$0.10 / 1M input tokens $0.30 / 1M output tokens
Pay per token on managed endpoints.
Commit & save
1 month
5%
For pilots and short bursts. Prepay one month and get an instant rate cut on Dedicated deployments or Serverless usage.
Flexible start date • Easy to extend.
6 months
10%
Best for steady workloads. Lock pricing for half a year and reduce your effective $/token and $/hr.
Pause/resume capacity within the window.
12 months
15%
Our best rate for teams running at scale. Annual commitment with predictable spend and maximum savings.
Ideal for production rollouts & SLAs.
1 month
5%
For pilots and short bursts. Prepay one month and get an instant rate cut on Dedicated deployments or Serverless usage.
Flexible start date • Easy to extend.
6 months
10%
Best for steady workloads. Lock pricing for half a year and reduce your effective $/token and $/hr.
Pause/resume capacity within the window.
12 months
15%
Our best rate for teams running at scale. Annual commitment with predictable spend and maximum savings.
Ideal for production rollouts & SLAs.
1 month
5%
For pilots and short bursts. Prepay one month and get an instant rate cut on Dedicated deployments or Serverless usage.
Flexible start date • Easy to extend.
6 months
10%
Best for steady workloads. Lock pricing for half a year and reduce your effective $/token and $/hr.
Pause/resume capacity within the window.
12 months
15%
Our best rate for teams running at scale. Annual commitment with predictable spend and maximum savings.
Ideal for production rollouts & SLAs.
Frequently asked questions.
Frequently asked questions.
How are dedicated deployments billed?
Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.
How are dedicated deployments billed?
Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.
How are dedicated deployments billed?
Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.
How is serverless inference priced?
Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.
How is serverless inference priced?
Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.
How is serverless inference priced?
Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.
Do you offer discounts for commitments?
Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.
Do you offer discounts for commitments?
Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.
Do you offer discounts for commitments?
Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.
What happens if we exceed our committed amount?
You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.
What happens if we exceed our committed amount?
You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.
What happens if we exceed our committed amount?
You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.
Are there any extra charges (egress, storage, support)?
No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.
Are there any extra charges (egress, storage, support)?
No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.
Are there any extra charges (egress, storage, support)?
No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.
Can we pin workloads to the EU and still use discounts?
Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.
Can we pin workloads to the EU and still use discounts?
Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.
Can we pin workloads to the EU and still use discounts?
Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.