Talk to an Engineer

Graphics Programmer (C++/CUDA)
AI Inference Platform

Lund, Stockholm, London, Remote

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.

The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.

How to Apply

Apply below with your resumé. Please include:

Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.

Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.

The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.

How to Apply

Apply below with your resumé. Please include:

Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.

Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.

The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.

How to Apply

Apply below with your resumé. Please include:

Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.

Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

Next generation
AI compute optimization

Navigation

Privacy & Data Policy

Next generation
AI compute optimization

Navigation

Privacy & Data Policy

Next generation
AI compute optimization

Navigation

Privacy & Data Policy

Graphics Programmer (C++/CUDA)AI Inference Platform

Lund, Stockholm, London, Remote

About Inceptron

The Role

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

Apply memory optimization techniques and tune performance across modern GPU architectures.

Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

Collaborate with compiler and systems engineers to land optimizations end-to-end.

Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

Excellent debugging and performance-tuning instincts.

A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.

Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

Understanding of serving stacks or model runtimes (scheduling, batching, routing).

Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.

How to Apply

Apply below with your resumé. Please include:

Links to your projects (GitHub, portfolio, demos, research).

An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.

Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

The Role

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

Apply memory optimization techniques and tune performance across modern GPU architectures.

Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

Collaborate with compiler and systems engineers to land optimizations end-to-end.

Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

Excellent debugging and performance-tuning instincts.

A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.

Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

Understanding of serving stacks or model runtimes (scheduling, batching, routing).

Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.

How to Apply

Apply below with your resumé. Please include:

Links to your projects (GitHub, portfolio, demos, research).

An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.

Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

The Role

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.

What you’ll do

Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

Apply memory optimization techniques and tune performance across modern GPU architectures.

Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

Collaborate with compiler and systems engineers to land optimizations end-to-end.

Contribute to our internal performance playbooks, benchmarks, and best practices.

What we’re looking for

Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

Excellent debugging and performance-tuning instincts.

A product mindset: bias for shipping incremental wins that move customer metrics.

Nice to have

LLVM experience (passes, IR, codegen) or exposure to MLIR.

Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

Understanding of serving stacks or model runtimes (scheduling, batching, routing).

Background in graphics, computer vision, or high-performance compute.

What you’ll get

Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

Graphics Programmer (C++/CUDA)
AI Inference Platform