Maxime Schmitt

Maxime Schmitt

Staff Engineer — Qualcomm France

EuroLLVM 2026 — Dublin

SPMD Programming Vectorization LLVM / Clang DSP / Accelerators

About me

R&D compiler engineer at Qualcomm with a background in polyhedral optimization techniques, specifically adaptive code refinement, where programs are analyzed and recompiled at runtime to unlock deeper optimizations. I previously worked on the Glow graph compiler before moving to LLVM/Clang.

My current focus is on LLVM/Clang, where a team and I are developing Ripple, an LLVM IR intrinsics API for SPMD parallelism on heterogeneous targets. C/C++ headers expose the API by mapping directly to Ripple LLVM intrinsics. Clang is minimally extended with a ripple_parallel construct that automatically generates vectorized loops with a masked tail, along with semantic checks to catch misuse at compile time. The implementation fits into a single LLVM module pass driven by dataflow tensor shape propagation. More about Ripple can be found in the next section.

If you are working on polyhedral compilation, accelerator backends, or SPMD programming models, I would love to chat.

Ripple — project overview

Ripple is an LLVM IR intrinsics API offering multi-level parallelism via an SPMD model, targeting heterogeneous hardware such as Qualcomm Hexagon DSPs with HVX vector extensions.

The documentation below covers the full API specification, HVX optimization techniques, and a troubleshooting reference. All source is on GitHub.

Ripple is currently maintained as a fork of the LLVM project, with Ripple commits rebased regularly against LLVM mainline.

Why Ripple?

Ripple exposes multi-dimensional tensors with NumPy-like broadcast semantics: operators apply element-wise across lanes, and operands of different shapes are automatically broadcast to a common shape, with no manual index arithmetic.

Target-specific vector libraries plug in directly: Ripple dispatches element-wise calls to user-provided vector functions at the right vector size. When a scalar definition is available, Ripple can specialize it for tensor-shaped arguments. Masked variants are also supported, with an extra mask parameter passed automatically when the call site is inside a vector conditional.

Vector conditionals trigger automatic if-conversion, so branches that depend on lane indices produce predicated operations rather than scalar fallbacks. The programmer writes normal C control flow and Ripple handles the lowering.

Since Ripple operates at the LLVM IR level as a target-independent pass, it emits standard LLVM vector instructions. The vectorization is explicit and predictable, which makes it practical to prototype and test new transformations without touching intrinsics, vendor builtins, or assembly.