Graphics Programmer (C++/CUDA)
AI Inference Platform

Lund, Stockholm, London, Remote

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.


The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.


What you’ll do

  • Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

  • Apply memory optimization techniques and tune performance across modern GPU architectures.

  • Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

  • Collaborate with compiler and systems engineers to land optimizations end-to-end.

  • Contribute to our internal performance playbooks, benchmarks, and best practices.


What we’re looking for

  • Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

  • Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

  • Excellent debugging and performance-tuning instincts.

  • A product mindset: bias for shipping incremental wins that move customer metrics.


Nice to have

  • LLVM experience (passes, IR, codegen) or exposure to MLIR.

  • Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

  • Understanding of serving stacks or model runtimes (scheduling, batching, routing).

  • Background in graphics, computer vision, or high-performance compute.


What you’ll get

  • Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

  • Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

  • Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.


How to Apply

Apply below with your resumé. Please include:

  • Links to your projects (GitHub, portfolio, demos, research).

  • An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.


Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.


The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.


What you’ll do

  • Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

  • Apply memory optimization techniques and tune performance across modern GPU architectures.

  • Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

  • Collaborate with compiler and systems engineers to land optimizations end-to-end.

  • Contribute to our internal performance playbooks, benchmarks, and best practices.


What we’re looking for

  • Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

  • Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

  • Excellent debugging and performance-tuning instincts.

  • A product mindset: bias for shipping incremental wins that move customer metrics.


Nice to have

  • LLVM experience (passes, IR, codegen) or exposure to MLIR.

  • Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

  • Understanding of serving stacks or model runtimes (scheduling, batching, routing).

  • Background in graphics, computer vision, or high-performance compute.


What you’ll get

  • Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

  • Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

  • Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.


How to Apply

Apply below with your resumé. Please include:

  • Links to your projects (GitHub, portfolio, demos, research).

  • An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.


Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

About Inceptron

Inceptron is building a next generation AI inference platform powered by a deep, proprietary compiler stack. We help customers run AI workloads with lower latency, higher throughput, and better cost efficiency across GPUs, FPGAs, and edge hardware — allowing teams to scale production AI without scaling the bill.


The Role

We’re hiring intermediate graphics programmers who love getting close to the metal. You’ll join a tight, collaborative team pushing the envelope of AI performance: writing and tuning shaders, optimizing memory and data movement, and shaping features that make real-world AI faster and more affordable while reducing energy use and easing pressure on the power grid.

We strongly encourage early-career engineers with demonstrated projects (open-source, research, side projects, GitHub/portfolio) to apply. If you’re not sure you tick every box, we still want to hear from you.


What you’ll do

  • Write, optimize, and maintain GPU kernels and shaders (CUDA) used in our inference runtime and visualization tooling.

  • Apply memory optimization techniques and tune performance across modern GPU architectures.

  • Profile and debug with tools such as Nsight, cuda-memcheck, and perf/VTune equivalents.

  • Collaborate with compiler and systems engineers to land optimizations end-to-end.

  • Contribute to our internal performance playbooks, benchmarks, and best practices.


What we’re looking for

  • Professional C++ experience (modern C++ preferred) with a focus on performance, concurrency, scalability, and correctness.

  • Hands-on CUDA programming and strong understanding of GPU architecture fundamentals.

  • Excellent debugging and performance-tuning instincts.

  • A product mindset: bias for shipping incremental wins that move customer metrics.


Nice to have

  • LLVM experience (passes, IR, codegen) or exposure to MLIR.

  • Familiarity with ROCm, Vulkan/DirectX/Metal compute, or shader toolchains.

  • Understanding of serving stacks or model runtimes (scheduling, batching, routing).

  • Background in graphics, computer vision, or high-performance compute.


What you’ll get

  • Work at the performance frontier: solve hard, meaningful performance problems that directly impact customer workloads.

  • Full-stack exposure: from compiler internals to GPU runtime integration and developer tooling.

  • Impact and autonomy: contribute in a small, senior team where your work quickly reaches customers.


How to Apply

Apply below with your resumé. Please include:

  • Links to your projects (GitHub, portfolio, demos, research).

  • An optional short story about a performance win you’ve achieved — include before/after metrics, your approach, and what you learned.


Inceptron is an equal opportunity employer. We value inclusive teams and welcome applicants from all backgrounds.

Next generation
AI compute optimization

© Inceptron 2025

Next generation
AI compute optimization

© Inceptron 2025

Next generation
AI compute optimization

© Inceptron 2025