Quiz Score
0/0

Understanding llvm-nanobind

An interactive guide to the transformation API

Why This Guide Exists

The Core Insight

"Human review as the bottleneck for AI work" suggests an education problem. If you understand deeply, you can review quickly and contribute better ideas.

This guide aims to take you from "I can read the diff" to "I can critique the design and suggest improvements." It's structured as a journey, not a reference.

What You'll Learn

The Problem We're Solving

LLVM is a compiler infrastructure. It represents programs in a language-agnostic form called IR (Intermediate Representation):

Source Code LLVM IR Machine Code │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ C/C++ │ │ │ │ x86 │ │ Rust │ ──────▶ │ IR │ ──────▶ │ ARM │ │ Swift │ │ │ │ WASM │ └─────────┘ └─────────┘ └─────────┘ Front-end Middle-end Back-end

The Key Insight

If you can modify the IR, you can transform programs: optimize them, obfuscate them, instrument them, or translate them to different targets.

The Traditional Pain

LLVM's API is C++. To write transformations, you need:

The solution: Python bindings let you write transformations in a high-level language. Faster iteration, easier experimentation.

🧠 Quick Check

Why might someone not want to use Python for LLVM work?

A Python doesn't support compiler concepts
B Python is ~100x slower; some APIs aren't exposed
C LLVM doesn't work on systems with Python
Right! Performance and completeness are the trade-offs. Python is great for prototyping, research, and one-off tools, but production compiler passes typically stay in C++.

LLVM IR Basics

LLVM IR is like a typed, structured assembly language. Understanding its structure is essential.

MODULE ├── Target Triple, Data Layout ├── GLOBALS (@global_var, @string_constant) ├── DECLARATIONS (declare @printf, @malloc) └── DEFINITIONS └── define @main() { ├── BASIC BLOCK: entry │ ├── %x = alloca i32 ← instruction │ ├── store i32 42, ptr %x ← instruction │ └── br label %loop ← TERMINATOR │ └── BASIC BLOCK: loop ├── %i = phi i32 [...] ← PHI node ├── ... └── ret i32 0 ← TERMINATOR }

Key invariant: Every basic block ends with exactly ONE terminator instruction.

SSA = Static Single Assignment. Every value is assigned exactly once.

; NOT valid (same variable twice):
%x = add i32 1, 2
%x = add i32 %x, 3   ; ERROR!

; Valid (different names):
%x = add i32 1, 2
%y = add i32 %x, 3   ; OK

Why SSA Matters for You

You can't "change" a value. You create a new value and replace uses of the old one. This is why replace_all_uses_with() is fundamental.

define i32 @factorial(i32 %n) {
entry:
  %cmp = icmp sle i32 %n, 1
  br i1 %cmp, label %base, label %recurse

base:
  ret i32 1

recurse:
  %sub = sub i32 %n, 1
  %call = call i32 @factorial(i32 %sub)
  %mul = mul i32 %n, %call
  ret i32 %mul
}

Three basic blocks: entry, base, recurse. Each ends with a terminator (br or ret).

🧠 Quick Check

What is a PHI node used for?

A Performing floating-point arithmetic
B Selecting a value based on which predecessor block we came from
C Declaring function parameters
Correct! PHI nodes handle control flow merges in SSA form. %result = phi i32 [ %x, %block1 ], [ %y, %block2 ] means: "use %x if we came from block1, %y if from block2."

Architecture

Understanding the layers helps you know what's possible and where limitations come from.

Your Python Code llvm-nanobind (This Project) nanobind (C++/Python Bridge) LLVM C API gaps here! LLVM C++ Core

Key Insight

When you find a missing feature, ask: "Does the LLVM C API expose this?" If yes, we can add it. If no, we'd need a C wrapper first.

API Design

The bindings try to be Pythonic while staying close to LLVM's model.

The Good Parts

# Context managers prevent leaks
with llvm.create_context() as ctx:
    with ctx.parse_ir(ir_text) as mod:
        # Module auto-disposed on exit

# Pythonic iteration
for func in mod.functions:
    for bb in func.basic_blocks:
        for inst in bb.instructions:
            ...

# Clean builder API
builder.add(a, b, "sum")   # not CreateAdd
builder.br(target_bb)       # not CreateBr

The Quirks

What You Try What Happens Correct Form
ctx.types.ptr Gets bound method, not type! ctx.types.ptr()
inst.parent AttributeError inst.block
for op in inst.operands: AttributeError for i in range(inst.num_operands):
inst.delete_instruction() LLVM crash! inst.remove_from_parent() first

Quirks & Gotchas

🧠 Quick Check

To iterate over all operands of an instruction, you use:

A for op in inst.operands:
B for op in inst:
C for i in range(inst.num_operands): op = inst.get_operand(i)
Yes! There's no .operands iterator. This is listed as a missing convenience in the improvement plan.

The Transformation Pattern

Almost every transformation follows this structure:

┌──────────────────────────────────────────────────────────────┐ │ 1. TRAVERSE │ │ for bb in func.basic_blocks: │ │ for inst in bb.instructions: │ └────────────────────────────┬─────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────┐ │ 2. IDENTIFY │ │ if inst.opcode == llvm.Opcode.Sub: │ │ to_transform.append(inst) │ └────────────────────────────┬─────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────┐ │ 3. BUILD │ │ builder.position_before(inst) │ │ new_val = builder.add(xor_result, mul_result) │ └────────────────────────────┬─────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────┐ │ 4. REPLACE │ │ replace_all_uses_with(inst, new_val) │ └────────────────────────────┬─────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────────┐ │ 5. CLEAN UP │ │ inst.remove_from_parent() │ │ inst.delete_instruction() │ │ assert mod.verify() │ └──────────────────────────────────────────────────────────────┘

MBA Substitution

Mixed Boolean Arithmetic uses the fact that arithmetic and bitwise operations are related.

The Identity

X - Y = (X ⊕ -Y) + 2×(X ∧ -Y)

Why It Works as Obfuscation

Decompilers pattern-match sub to -. But they don't recognize (x ^ neg_y) + 2*(x & neg_y) as subtraction. It looks like arbitrary bitwise soup.

🧠 Quick Check

The MBA pass randomly chooses between multiple equivalent XOR formulas. Why?

A To test which one is fastest
B Polymorphism defeats pattern matching
C Some formulas only work with certain inputs
Exactly! If all XORs became the same pattern, a reverse engineer could write a pattern matcher to undo them. Variety makes automated deobfuscation harder.

Control Flow Flattening

CFF hides the program's control flow structure by introducing a state-machine dispatcher.

┌─────────┐ │ entry │ └────┬────┘ │ if (cond) ┌─────┴─────┐ ▼ ▼ ┌──────────┐ ┌──────────┐ │ block_A │ │ block_B │ └────┬─────┘ └────┬─────┘ └──────┬─────┘ ▼ ┌──────────┐ │ exit │ └──────────┘ Clear if/else structure visible to analysis
┌─────────┐ │ entry │ │state = A│ └────┬────┘ ▼ ┌─────────────┐◄──────────────┐ │ dispatcher │ │ │switch(state)│ │ └──────┬──────┘ │ ┌───────────┼───────────┐ │ ▼ ▼ ▼ │ ┌──────────┐ ┌──────────┐ ┌───────┐ │ │ block_A │ │ block_B │ │ exit │ │ │state = ?│ │state = ?│ │ ret │ │ └────┬─────┘ └────┬─────┘ └───────┘ │ └────────────┴──────────────────────┘ All paths go through dispatcher
  1. Demote PHI nodes to stack variables (PHIs encode predecessor info, lost after flattening)
  2. Create state variable in entry block
  3. Assign random state values to each block
  4. Create dispatcher that branches based on state
  5. Rewrite terminators to update state + jump to dispatcher

PHI Demotion is Critical

PHI nodes say "if we came from block1, use %x". After flattening, we always come from the dispatcher! So we convert PHIs to explicit memory (alloca → store → load).

Critical Evaluation Framework

Use these lenses to critique the work:

Lens 1: Consistency

Can you predict the name of something you haven't used? If not, there's an inconsistency.

ExpectedActualConsistent?
.parent.block
.operands(doesn't exist)
.ptr.ptr()

Lens 2: Pit of Success

Does the easy path lead to correct code? Or does it lead to crashes?

Example: inst.delete_instruction() looks natural but crashes. The API should make the safe path obvious.

Lens 3: Completeness

Can you do everything you need? The porting guide rates: "7/10 for code generation, 5/10 for transforms."

Transforms need RAUW, block splitting, instruction movement—mostly missing.

Planned Improvements

These block use cases entirely:

These cause confusion and bugs:

These make the API more Pythonic:

Final Assessment

Test your understanding with these synthesis questions:

Question 1

Which improvement has the HIGHEST priority?

A Adding an .operands iterator
B Binding LLVMReplaceAllUsesWith
C Making ptr a property
Correct! RAUW is Priority 1 (Critical Blocker). The iterator is P3 (Convenience), and ptr consistency is P2 (API Consistency).

Question 2

Why does CFF demote PHI nodes before flattening?

A PHI nodes are too slow
B PHI nodes encode predecessor info that's lost after flattening
C The LLVM API doesn't support PHI nodes
Yes! PHI nodes say "if we came from block1, use %x". After flattening, we always come from the dispatcher—the predecessor info is meaningless. Converting to memory operations preserves semantics.

Question 3

What happens if you access a Module after exiting its with block?

A Clean Python exception (validity tokens)
B Segmentation fault
C Returns garbage data
Right! The bindings use validity tokens to track object lifetime. When the context manager exits, children are marked invalid, and accessing them raises a clean exception instead of undefined behavior.