123 lines
5.0 KiB
Markdown
123 lines
5.0 KiB
Markdown
# 8086-rs
|
||
|
||
8086-rs is a Rust-based toolchain for analyzing and interpreting binaries, compiled for the Intel 16-bit 8086-type family, made with the intention of interpreting binaries compiled for MINIX 1.x.
|
||
|
||
Features:
|
||
- A parser for the `a.out` format, to parse legacy MINIX 1.x executables
|
||
- A disassembler to parse the 16-bit instructions into an IR
|
||
- Disassembly output in a `objdump(1)`-style fashion
|
||
- Interpretation of instructions
|
||
- MINIX 1.x interrupts and memory layout
|
||
- Obeying of segment register indirection (`CS`, `SS`, `DS`, `ES`)
|
||
- Full 20-bit memory bus
|
||
|
||
## Usage
|
||
|
||
To compile and run the tool, use Cargo:
|
||
```
|
||
cargo build --release
|
||
```
|
||
|
||
Or run it directly:
|
||
```
|
||
cargo run -- --help
|
||
```
|
||
|
||
Run with output:
|
||
```
|
||
RUST_LOG=debug cargo run -- interpret -p ./a.out 2>&1 | less
|
||
```
|
||
|
||
`info` will show things, such as register state and call to interrupts, `debug` will additionally show disassmbly and interpretation internals.
|
||
|
||
CLI Options:
|
||
```
|
||
$ cargo run -- --help
|
||
Simple program to disassemble and interpret 8086 a.out compilates, e.g. such for MINIX
|
||
|
||
Usage: i8086-rs [OPTIONS] [ARGV]... <COMMAND>
|
||
|
||
Commands:
|
||
disassemble Disassemble the binary into 8086 instructions [aliases: d]
|
||
interpret Interpret the 8086 instructions [aliases: i]
|
||
help Print this message or the help of the given subcommand(s)
|
||
|
||
Arguments:
|
||
[ARGV]... argv passed to the program, which will be interpreted
|
||
|
||
Options:
|
||
-p, --path <PATH> Path of the binary
|
||
-d, --dump Dump progress of disassembly, in case of encountering an error
|
||
-h, --help Print help
|
||
-V, --version Print version
|
||
```
|
||
|
||
## Example
|
||
```
|
||
$ cat 1.c
|
||
main() {
|
||
write(1, "hello\n", 6);
|
||
}
|
||
|
||
$ ./target/release/i8086-rs interpret -p ./a.out
|
||
hello
|
||
|
||
$ RUST_LOG=info ./target/release/i8086-rs interpret -p ./a.out
|
||
INFO: Initializing stack...
|
||
INFO: Initializing static data...
|
||
INFO: (0000) xor %bp, %bp 0000 0000 0000 0000 ffb4 0000 0000 0000 ---------
|
||
INFO: (0002) mov %bx, %sp 0000 0000 0000 0000 ffb4 0000 0000 0000 -----Z---
|
||
INFO: (0004) mov %ax, [%bx] 0000 ffb4 0000 0000 ffb4 0000 0000 0000 -----Z---
|
||
...
|
||
```
|
||
|
||
## Status
|
||
|
||
This project is under active development and primarily used by me to explore some Intel disassembly and learn some more Rust.
|
||
Expect bugs and some missing features.
|
||
I mainly test with 'official' binaries from the MINIX source tree.
|
||
|
||
Currently, everything is in the binary, but I want to move some parts to a lib, which would make it much easier to ignore the Minix 1.x specifics (e.g. currently with a hardcoded interrupt handler) and would allow for more generic usage of this 8086 (e.g. implenting an own simple BIOS or OS).
|
||
But first I want to implement all features correctly and add tests for all of them, before I want to move to that.
|
||
|
||
## Caveats
|
||
|
||
Code is currently not fetched from memory, but from a seperate vector, stored inside the Disassembler struct, which fetches and parses the next instruction from the instruction pointer.
|
||
Although, the `CS:IP` addressing scheme is still being used, to allow for 20-bit access, but does currently now allow for self-modifying code.
|
||
|
||
Also the disassmbler just uses an initial sweep for disassembly, which has a high probability of not being accurate, when compared to the runtime.
|
||
E.g. maybe there is a jump to a memory address during interpretation, which was not identified as an instruction by the disassembler.
|
||
|
||
## Documentation
|
||
|
||
The documentation of the project itself can be accessed by using `cargo doc`.
|
||
```
|
||
$ cargo doc
|
||
$ firefox target/doc/i8086_rs/index.html
|
||
```
|
||
|
||
For the implementation of the disassembly, I used the Intel "8086 16-BIT HMOS MICROPROCESSOR" Spec, as well as [this](http://www.mlsite.net/8086/8086_table.txt) overview of all Opcode variants used in conjunction with [this](http://www.mlsite.net/8086/) decoding matrix.
|
||
|
||
For the implementation of the interpreter, I used the Intel "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" Spec.
|
||
|
||
## TODOs
|
||
|
||
- Map instructions into actual memory for interpretation
|
||
- Implement all Minix Interrupts
|
||
- Allow execution of 'raw' instructions, not only `a.out`
|
||
- Don't hardcode Minix
|
||
- Implement BIOS Interrupts
|
||
|
||
|
||
## FAQ
|
||
|
||
#### Why hassle with interpretation and not just emulate 8086?
|
||
For once, this project stemmed from a university exercise about the 8086 instruction set and disassembly.
|
||
An interpreter for these assembly instructions was the logical (?) next step.
|
||
Maybe I add raw 8086 emulation some day.
|
||
|
||
#### Why no `nom`?
|
||
There is no real reason, I just wanted to try to implement most parts myself, even if it meant more boilerplate code.
|
||
I used `nom` extensivly in the past and I just wanted to see what it would be like without that crate.
|
||
In hindsight, using `nom` would have been the cleaner option, but hey, something I only learned by not using `nom` for once.
|