Felix Breval

Cybersecurity Engineer

Cracking the Code: On the Art of Reverse Engineering Rust Enums

Rust is, without a doubt, a game-changer. Its promise of memory safety without a garbage collector is a feat of engineering, giving developers the power to build systems that are both blazing-fast and secure. But this power comes at a cost, at least for some of us.

For those in security analysis or reverse engineering, a compiled Rust binary is an opaque black box. The same compiler that rigorously eliminates memory bugs also aggressively optimizes and abstracts, leaving behind a binary that bears little resemblance to its source. Our traditional toolsets, honed on C and C++, are often left in the dust, showing us C-like gibberish where beautiful, high-level Rust types once lived.

At the very heart of this challenge lies one of Rust's most elegant features: the enum. This is not your grandfather's C-style integer list. It's a powerful, type-safe "tagged union," and its in-memory representation is a masterclass in optimization—and a nightmare to reverse engineer.

This post chronicles a research deep-dive into this very problem. How can we look at a compiled binary and reconstruct the Rust enums hidden within? The journey was a complex one, winding deep into the compiler's internals.


Part 1: The Hunt for "Ground Truth"

Before you can reconstruct anything, you need a map. You need a "ground truth"—a 100% reliable source of what a Rust enum's memory layout actually is. Our first quest was simply to get this information from the source code.

Attempt 1: The rust-analyzer Detour

The first logical stop was rust-analyzer. If the tool powering your IDE can understand the code well enough to offer suggestions, surely it knows the layout.

This path was a dead end. While rust-analyzer is a marvel of semantic analysis, it's not a compiler. We could get high-level data like size and alignment, but the critical details—the actual numeric values of the enum discriminants—were nowhere to be found.

Attempt 2: The rustc_driver Saga

If the assistant can't help, go to the boss. The next step was to harness the Rust compiler itself. The plan: use rustc_driver, the compiler's own interface, to inject our analysis logic directly into the compilation process.

A diagram here could illustrate the rustc compilation pipeline, showing how rustc_driver allows injecting code at specific stages, like after_analysis.

By implementing a Callbacks trait, we could halt the compiler right after its semantic analysis, tap into its "brain" (the TyCtxt), and ask it directly for a type's layout.

A code block here could show a minimal implementation of the Callbacks trait and the after_analysis function.

This, it turned out, was the path of pain. The technical hurdles were immense:

  • Private APIs: We had to use #![feature(rustc_private)] to link against the compiler's unstable, internal-only crates.
  • Building Rust from Source: This approach requires linking against the exact same ABI as the toolchain. A standard rustup install wouldn't work. We had to build a specific Rust version (1.85.0) entirely from source.
  • Forcing Shared Libraries: The show-stopper was rustc's link to LLVM. The compiler links it statically, but our tool needed a dynamic (.so) library. The only fix was a custom config.toml to force link-shared = true, which meant rebuilding all of LLVM too.

The process was fragile, non-portable, and took hours per test cycle. This approach was a dead end.

Attempt 3: The "Aha!" Moment

After weeks of struggling with rustc_driver, a casual chat with a rustc developer on Zulip changed everything. "By the way," he asked, "why haven't you used -Zprint-type-sizes?"

This single, unstable compiler flag was the key.

A code block here would be perfect, showing the command rustc my_enum.rs -Zprint-type-sizes and its detailed output for a simple enum.

It turned out the flag was non-functional in our 1.85.0 build but worked perfectly in a 1.86.0 nightly. This command instructs rustc to dump exactly what we needed: size, alignment, field offsets, and the holy grail... the discriminant values for every enum variant. With this "ground truth" finally in hand, the real work of reconstruction could begin.


Part 2: Rebuilding the Blueprint (DWARF Analysis)

Our first reconstruction target was a binary compiled with debug symbols. These symbols, stored in the DWARF format, are a detailed blueprint of the original code.

A diagram illustrating the DWARF format as a tree of Debugging Information Entries (DIEs) with tags like DW_TAG_... and attributes like DW_AT_... would be very helpful here.

We built a Rust-based analyzer using the gimli crate to parse this data. The tool had to be smart, as rustc uses two different DWARF structures for enums:

  • C-Style: Simple enums get a standard DW_TAG_enumeration_type.
  • Tagged Unions: Complex enums with data (like Option<T>) are modeled as a DW_TAG_structure_type containing a DW_TAG_variant_part.

A code block showing the gimli logic for differentiating between these two tags would be effective.

The Good, The Bad, and The Buggy

When we compared our tool's output to the ground truth, the results were a mixed bag.

The Good: For simple C-style enums, it was a spectacular success. Our tool not only matched the ground truth but surpassed it, correctly reconstructing explicit discriminant values (like Ok = 200) where -Zprint-type-sizes was silent.

The Bad: On modern "tagged unions," the analyzer failed. It couldn't parse the variant names, outputting __UNKNOWN_VARIANT__ for all of them.

The Buggy: Worse, it consistently reported incorrect discriminant values. It would often output i64::MIN (-9223372036854775808), a classic sign of an interpretation bug—it was reading the raw binary value as a signed 64-bit integer instead of as its actual type (like u8).

This proves that DWARF isn't a silver bullet. rustc's representation of its advanced types is complex and requires a far more sophisticated parser to unravel.


Part 3: Flying Blind (Analyzing Stripped Binaries)

This is the ultimate challenge: a release binary, stripped of all symbols. All we have is raw machine code. Here, we must stop parsing data and start recognizing patterns.

Heuristic A: The Telltale Heart of match

The idiomatic way to use an enum is with a match statement. In assembly, this match leaves a distinct fingerprint: a "compare-and-jump chain."

A diagram here showing the control flow graph of a match statement, with a central node branching to multiple code blocks, would be ideal.

The assembly code tells a clear story:


  mov   eax, [rbp-0x8]  ; Load the discriminant
  cmp   eax, 0          ; Is it variant 0?
  je    L_variant_0_code
  cmp   eax, 1          ; Is it variant 1?
  je    L_variant_1_code
  cmp   eax, 2          ; Is it variant 2?
  je    L_variant_2_code
  jmp   L_end_match
                

We built a heuristic analyzer to hunt for this cmp/je pattern. The results were promising:

  • Structural Success: It found the match structures and correctly inferred the number of variants.
  • Partial Success: For simple enums, it even extracted the correct discriminant values (e.g., 200, 44, 50) from the cmp instructions.
  • Failure: On complex tagged unions, it failed. The compiler was clearly optimizing the discriminant value before the comparison, making our naive pattern-matching fail.

Heuristic B: The Semantic Goldmine (The Debug Trail)

The match analysis gives us an anonymous skeleton. How do we find the names? The answer often lies in std::fmt::Debug.

When you #[derive(Debug)], rustc generates a function that contains literal strings: the type's name and its fields. If that type is ever used as a trait object (&dyn Debug), the compiler generates a vtable (virtual method table) in .rodata—a list of function pointers.

A diagram showing the &dyn Debug fat pointer (data + vtable) and the vtable structure (pointers to drop, size, align, fmt) would be extremely valuable.

The theoretical algorithm is a goldmine:

  1. Scan .rodata for vtable candidates (sequences of pointers into .text).
  2. Grab the fmt function pointer from the vtable.
  3. Disassemble the fmt function.
  4. Scan its code for strings it loads (e.g., lea rcx, [rip + ...]).
  5. Analyze those strings: PascalCase is the type name, snake_case are the field names.

A code block showing the vtable scanning logic or the fmt function analysis logic would fit well here.

This heuristic is incredibly powerful, but also incredibly complex to implement reliably. Due to time constraints, this part of the research remains an exciting, but incomplete, proof of concept.


The Road Ahead

This research journey into Rust's internals was challenging, but it confirms one thing: reverse engineering Rust is hard, but not impossible.

  • Tooling is Tricky: The compiler is a complex beast. Don't try to wrestle it (rustc_driver) if a simpler tool (-Zprint-type-sizes) will do.
  • DWARF Isn't Enough: DWARF provides a map, but rustc's optimizations require a better map-reader than we have.
  • Heuristics are the Future: For stripped binaries, pattern-matching is key. Our match analysis proves we can recover the structure.

The true breakthrough will come from combining these heuristics. Imagine a tool that uses match analysis to find a skeleton, then uses vtable analysis to find the names and "dress" that skeleton. The path is difficult, but this work provides a solid foundation for the next generation of Rust analysis tools.