Understanding llvm-nanobind
An interactive guide to the transformation API
Why This Guide Exists
The Core Insight
"Human review as the bottleneck for AI work" suggests an education problem. If you understand deeply, you can review quickly and contribute better ideas.
This guide aims to take you from "I can read the diff" to "I can critique the design and suggest improvements." It's structured as a journey, not a reference.
What You'll Learn
The Problem We're Solving
LLVM is a compiler infrastructure. It represents programs in a language-agnostic form called IR (Intermediate Representation):
The Key Insight
If you can modify the IR, you can transform programs: optimize them, obfuscate them, instrument them, or translate them to different targets.
The Traditional Pain
LLVM's API is C++. To write transformations, you need:
- C++ expertise and build system complexity
- Long compile times
- Manual memory management
The solution: Python bindings let you write transformations in a high-level language. Faster iteration, easier experimentation.
🧠 Quick Check
Why might someone not want to use Python for LLVM work?
LLVM IR Basics
LLVM IR is like a typed, structured assembly language. Understanding its structure is essential.
Key invariant: Every basic block ends with exactly ONE terminator instruction.
SSA = Static Single Assignment. Every value is assigned exactly once.
; NOT valid (same variable twice):
%x = add i32 1, 2
%x = add i32 %x, 3 ; ERROR!
; Valid (different names):
%x = add i32 1, 2
%y = add i32 %x, 3 ; OK
Why SSA Matters for You
You can't "change" a value. You create a new value and replace uses of the old one. This is why replace_all_uses_with() is fundamental.
define i32 @factorial(i32 %n) {
entry:
%cmp = icmp sle i32 %n, 1
br i1 %cmp, label %base, label %recurse
base:
ret i32 1
recurse:
%sub = sub i32 %n, 1
%call = call i32 @factorial(i32 %sub)
%mul = mul i32 %n, %call
ret i32 %mul
}
Three basic blocks: entry, base, recurse. Each ends with a terminator (br or ret).
🧠 Quick Check
What is a PHI node used for?
%result = phi i32 [ %x, %block1 ], [ %y, %block2 ] means: "use %x if we came from block1, %y if from block2."
Architecture
Understanding the layers helps you know what's possible and where limitations come from.
Key Insight
When you find a missing feature, ask: "Does the LLVM C API expose this?" If yes, we can add it. If no, we'd need a C wrapper first.
API Design
The bindings try to be Pythonic while staying close to LLVM's model.
The Good Parts
# Context managers prevent leaks
with llvm.create_context() as ctx:
with ctx.parse_ir(ir_text) as mod:
# Module auto-disposed on exit
# Pythonic iteration
for func in mod.functions:
for bb in func.basic_blocks:
for inst in bb.instructions:
...
# Clean builder API
builder.add(a, b, "sum") # not CreateAdd
builder.br(target_bb) # not CreateBr
The Quirks
| What You Try | What Happens | Correct Form |
|---|---|---|
ctx.types.ptr |
Gets bound method, not type! | ctx.types.ptr() |
inst.parent |
AttributeError | inst.block |
for op in inst.operands: |
AttributeError | for i in range(inst.num_operands): |
inst.delete_instruction() |
LLVM crash! | inst.remove_from_parent() first |
Quirks & Gotchas
🧠 Quick Check
To iterate over all operands of an instruction, you use:
.operands iterator. This is listed as a missing convenience in the improvement plan.
The Transformation Pattern
Almost every transformation follows this structure:
MBA Substitution
Mixed Boolean Arithmetic uses the fact that arithmetic and bitwise operations are related.
The Identity
X - Y = (X ⊕ -Y) + 2×(X ∧ -Y)
Why It Works as Obfuscation
Decompilers pattern-match sub to -. But they don't recognize (x ^ neg_y) + 2*(x & neg_y) as subtraction. It looks like arbitrary bitwise soup.
🧠 Quick Check
The MBA pass randomly chooses between multiple equivalent XOR formulas. Why?
Control Flow Flattening
CFF hides the program's control flow structure by introducing a state-machine dispatcher.
- Demote PHI nodes to stack variables (PHIs encode predecessor info, lost after flattening)
- Create state variable in entry block
- Assign random state values to each block
- Create dispatcher that branches based on state
- Rewrite terminators to update state + jump to dispatcher
PHI Demotion is Critical
PHI nodes say "if we came from block1, use %x". After flattening, we always come from the dispatcher! So we convert PHIs to explicit memory (alloca → store → load).
Critical Evaluation Framework
Use these lenses to critique the work:
Lens 1: Consistency
Can you predict the name of something you haven't used? If not, there's an inconsistency.
| Expected | Actual | Consistent? |
|---|---|---|
.parent | .block | ❌ |
.operands | (doesn't exist) | ❌ |
.ptr | .ptr() | ❌ |
Lens 2: Pit of Success
Does the easy path lead to correct code? Or does it lead to crashes?
Example: inst.delete_instruction() looks natural but crashes. The API should make the safe path obvious.
Lens 3: Completeness
Can you do everything you need? The porting guide rates: "7/10 for code generation, 5/10 for transforms."
Transforms need RAUW, block splitting, instruction movement—mostly missing.
Planned Improvements
These block use cases entirely:
These cause confusion and bugs:
These make the API more Pythonic:
Final Assessment
Test your understanding with these synthesis questions:
Question 1
Which improvement has the HIGHEST priority?
Question 2
Why does CFF demote PHI nodes before flattening?
Question 3
What happens if you access a Module after exiting its with block?