The nice thing about the instrumentation used by American Fuzzy Lop is that it allows us to do much more than just, well, fuzzing stuff. For example, for a while now, the fuzzer shipped with a standalone tool called afl-tmin, which allows you to take an interesting file and automatically shrink it - all while making sure that it still exercises the same functionality in the targeted binary (or triggers the same crash). Another tool, afl-cmin, pulls of the same trick for eliminating redundant files in large fuzzing corpora.
The latest release of AFL features another nifty new addition along these lines: afl-analyze. The tool takes an input file, sequentially flips bytes, and then gives you a human-readable report explaining the structure of the file, based on the observed changes to the execution path within the target binary. It can tell apart:
- No-op blocks, such as comments.
- Checksums, magic values, and atomically compared syntax tokens.
- Blobs of checksummed or encrypted data.
- "Pure" data blocks with no encryption or checksum guards.
Here's a quick demo, showing afl-analyze figure out that when running cut -d ' ' -f1, only the spaces and newlines really matter in any way:
Interestingly, the fact that offset #19 is flagged as a "critical byte" it also tells us that cut always tokenizes the entire line, even if all we're asking is the first field.
Of course, the program is better-suited for incomprehensible binary formats than for simple text utilities; it can also work with black-box binaries, thanks to the QEMU integration supported in AFL for a longer while. Anyway, let's try libpng instead:
This checks out: we have two four-byte signatures, followed by chunk length, four-byte chunk name, and chunk length. Neat, right? Now, be warned that it shipped just moments ago and is still a bit experimental - field testing and feedback welcome!
(The approach can be likely refined by looking at how much the execution path changes in response to input tweaks. I'm also a bit tempted to get rid of the verbose output and let the tool just generate a color-coded hex dump, based on Hamming distance to the original exec map.)