Graphics Programmer (C++/CUDA)
AI Inference Platform
Lund, Stockholm, London, Remote
About Inceptron
Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.
The Role
We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.
We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.
What you’ll do
Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.
What we’re looking for
Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.
Nice to have
LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.
What you’ll get
Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.
How to Apply
Apply below with your resumé. Please include:
Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.
Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.
About Inceptron
Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.
The Role
We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.
We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.
What you’ll do
Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.
What we’re looking for
Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.
Nice to have
LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.
What you’ll get
Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.
How to Apply
Apply below with your resumé. Please include:
Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.
Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.
About Inceptron
Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.
The Role
We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.
We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.
What you’ll do
Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.
Apply memory optimization techniques and tune performance across modern GPU architectures.
Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.
Collaborate with compiler and systems engineers to land optimizations end-to-end.
Contribute to our internal performance playbooks, benchmarks, and best practices.
What we’re looking for
Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.
Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.
Excellent debugging and performance-tuning instincts.
A product mindset: bias for shipping incremental wins that move customer metrics.
Nice to have
LLVM experience (passes, IR, codegen) or exposure to MLIR.
Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.
Understanding of serving stacks or model runtimes (scheduling, batching, routing).
Background in graphics, computer vision, or high-performance compute.
What you’ll get
Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.
Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.
Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.
How to Apply
Apply below with your resumé. Please include:
Links to your projects (GitHub, portfolio, demos, research).
An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.