# 8086-rs 8086-rs is a Rust-based toolchain for analyzing and interpreting binaries, compiled for the Intel 16-bit 8086-type family, made with the intention of interpreting binaries compiled for MINIX 1.x. Features: - A parser for the `a.out` format, to parse legacy MINIX 1.x executables - A disassembler to parse the 16-bit instructions into an IR - Disassembly output in a `objdump(1)`-style fashion - Interpretation of instructions - MINIX 1.x interrupts and memory layout - Obeying of segment register indirection (`CS`, `SS`, `DS`, `ES`) - Full 20-bit memory bus ## Usage To compile and run the tool, use Cargo: ``` cargo build --release ``` Or run it directly: ``` cargo run -- --help ``` Run with output: ``` RUST_LOG=debug cargo run -- interpret -p ./a.out 2>&1 | less ``` `info` will show things, such as register state and call to interrupts, `debug` will additionally show disassmbly and interpretation internals. CLI Options: ``` $ cargo run -- --help Simple program to disassemble and interpret 8086 a.out compilates, e.g. such for MINIX Usage: i8086-rs [OPTIONS] [ARGV]... Commands: disassemble Disassemble the binary into 8086 instructions [aliases: d] interpret Interpret the 8086 instructions [aliases: i] help Print this message or the help of the given subcommand(s) Arguments: [ARGV]... argv passed to the program, which will be interpreted Options: -p, --path Path of the binary -d, --dump Dump progress of disassembly, in case of encountering an error -h, --help Print help -V, --version Print version ``` ## Example ``` $ cat 1.c main() { write(1, "hello\n", 6); } $ ./target/release/i8086-rs interpret -p ./a.out hello $ RUST_LOG=info ./target/release/i8086-rs interpret -p ./a.out INFO: Initializing stack... INFO: Initializing static data... INFO: (0000) xor %bp, %bp 0000 0000 0000 0000 ffb4 0000 0000 0000 --------- INFO: (0002) mov %bx, %sp 0000 0000 0000 0000 ffb4 0000 0000 0000 -----Z--- INFO: (0004) mov %ax, [%bx] 0000 ffb4 0000 0000 ffb4 0000 0000 0000 -----Z--- ... ``` ## Status This project is under active development and primarily used by me to explore some Intel disassembly and learn some more Rust. Expect bugs and some missing features. I mainly test with 'official' binaries from the MINIX source tree. Currently, everything is in the binary, but I want to move some parts to a lib, which would make it much easier to ignore the Minix 1.x specifics (e.g. currently with a hardcoded interrupt handler) and would allow for more generic usage of this 8086 (e.g. implenting an own simple BIOS or OS). But first I want to implement all features correctly and add tests for all of them, before I want to move to that. ## Caveats Code is currently not fetched from memory, but from a seperate vector, stored inside the Disassembler struct, which fetches and parses the next instruction from the instruction pointer. Although, the `CS:IP` addressing scheme is still being used, to allow for 20-bit access, but does currently now allow for self-modifying code. Also the disassmbler just uses an initial sweep for disassembly, which has a high probability of not being accurate, when compared to the runtime. E.g. maybe there is a jump to a memory address during interpretation, which was not identified as an instruction by the disassembler. ## Documentation The documentation of the project itself can be accessed by using `cargo doc`. ``` $ cargo doc $ firefox target/doc/i8086_rs/index.html ``` For the implementation of the disassembly, I used the Intel "8086 16-BIT HMOS MICROPROCESSOR" Spec, as well as [this](http://www.mlsite.net/8086/8086_table.txt) overview of all Opcode variants used in conjunction with [this](http://www.mlsite.net/8086/) decoding matrix. For the implementation of the interpreter, I used the Intel "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" Spec. ## TODOs - Map instructions into actual memory for interpretation - Implement all Minix Interrupts - Allow execution of 'raw' instructions, not only `a.out` - Don't hardcode Minix - Implement BIOS Interrupts ## FAQ #### Why hassle with interpretation and not just emulate 8086? For once, this project stemmed from a university exercise about the 8086 instruction set and disassembly. An interpreter for these assembly instructions was the logical (?) next step. Maybe I add raw 8086 emulation some day. #### Why no `nom`? There is no real reason, I just wanted to try to implement most parts myself, even if it meant more boilerplate code. I used `nom` extensivly in the past and I just wanted to see what it would be like without that crate. In hindsight, using `nom` would have been the cleaner option, but hey, something I only learned by not using `nom` for once.