Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Inceptron compiler, now open for early access. Auto-compile models for maximum efficiency. Join early access →

Get in Touch with Us

Talk to an engineer

Looking for product support or have feedback?

Join our Discord and follow the #questions and #feature-requests channels to ask your queries and get the latest updates. You can also review our AI development docs page for more information.

Looking for product support or have feedback?

Join our Discord and follow the #questions and #feature-requests channels to ask your queries and get the latest updates. You can also review our AI development docs page for more information.

Looking for product support or have feedback?

Join our Discord and follow the #questions and #feature-requests channels to ask your queries and get the latest updates. You can also review our AI development docs page for more information.

Frequently asked questions.

Frequently asked questions.

How are dedicated deployments billed?

Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.

How are dedicated deployments billed?

Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.

How are dedicated deployments billed?

Hourly per GPU, pro-rated to the minute. For example, H100 80 GB at $3/hr only charges for the time the instance is running. No hidden fees.

How is serverless inference priced?

Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.

How is serverless inference priced?

Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.

How is serverless inference priced?

Per token. You’re billed separately for input and output tokens (e.g., Llama-3.3-70B Instruct at $0.10/M input, $0.30/M output). We round to the nearest 1,000 tokens per request batch.

Do you offer discounts for commitments?

Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.

Do you offer discounts for commitments?

Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.

Do you offer discounts for commitments?

Yes—commit & save: 1 month (5%), 6 months (10%), 12 months (20%). Discounts apply to both Dedicated deployments and Serverless usage within the commitment window.

What happens if we exceed our committed amount?

You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.

What happens if we exceed our committed amount?

You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.

What happens if we exceed our committed amount?

You continue at on-demand rates for overage, or you can top up the commitment to keep the discount. Unused prepaid amounts roll over within the commitment window.

Are there any extra charges (egress, storage, support)?

No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.

Are there any extra charges (egress, storage, support)?

No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.

Are there any extra charges (egress, storage, support)?

No hidden fees. Standard usage includes API access, observability, and basic support. If you request enterprise add-ons (custom regions, private networking, premium support), we’ll quote them explicitly.

Can we pin workloads to the EU and still use discounts?

Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.

Can we pin workloads to the EU and still use discounts?

Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.

Can we pin workloads to the EU and still use discounts?

Yes. Discounts apply regardless of region. If you enable EU-only residency, all compute and logs remain in-region; pricing is the same unless your contract specifies custom SLAs or regions. If you want more depth, we can add entries for: free tier/trials, billing cadence & invoices, cancelation/refunds, tax/VAT handling, and premium support SLAs.