Files
8086-rs/README.md
2025-07-08 11:19:26 +09:00

123 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 8086-rs
8086-rs is a Rust-based toolchain for analyzing and interpreting binaries, compiled for the Intel 16-bit 8086-type family, made with the intention of interpreting binaries compiled for MINIX 1.x.
Features:
- A parser for the `a.out` format, to parse legacy MINIX 1.x executables
- A disassembler to parse the 16-bit instructions into an IR
- Disassembly output in a `objdump(1)`-style fashion
- Interpretation of instructions
- MINIX 1.x interrupts and memory layout
- Obeying of segment register indirection (`CS`, `SS`, `DS`, `ES`)
- Full 20-bit memory bus
## Usage
To compile and run the tool, use Cargo:
```
cargo build --release
```
Or run it directly:
```
cargo run -- --help
```
Run with output:
```
RUST_LOG=debug cargo run -- interpret -p ./a.out 2>&1 | less
```
`info` will show things, such as register state and call to interrupts, `debug` will additionally show disassmbly and interpretation internals.
CLI Options:
```
$ cargo run -- --help
Simple program to disassemble and interpret 8086 a.out compilates, e.g. such for MINIX
Usage: i8086-rs [OPTIONS] [ARGV]... <COMMAND>
Commands:
disassemble Disassemble the binary into 8086 instructions [aliases: d]
interpret Interpret the 8086 instructions [aliases: i]
help Print this message or the help of the given subcommand(s)
Arguments:
[ARGV]... argv passed to the program, which will be interpreted
Options:
-p, --path <PATH> Path of the binary
-d, --dump Dump progress of disassembly, in case of encountering an error
-h, --help Print help
-V, --version Print version
```
## Example
```
$ cat 1.c
main() {
write(1, "hello\n", 6);
}
$ ./target/release/i8086-rs interpret -p ./a.out
hello
$ RUST_LOG=info ./target/release/i8086-rs interpret -p ./a.out
INFO: Initializing stack...
INFO: Initializing static data...
INFO: (0000) xor %bp, %bp 0000 0000 0000 0000 ffb4 0000 0000 0000 ---------
INFO: (0002) mov %bx, %sp 0000 0000 0000 0000 ffb4 0000 0000 0000 -----Z---
INFO: (0004) mov %ax, [%bx] 0000 ffb4 0000 0000 ffb4 0000 0000 0000 -----Z---
...
```
## Status
This project is under active development and primarily used by me to explore some Intel disassembly and learn some more Rust.
Expect bugs and some missing features.
I mainly test with 'official' binaries from the MINIX source tree.
Currently, everything is in the binary, but I want to move some parts to a lib, which would make it much easier to ignore the Minix 1.x specifics (e.g. currently with a hardcoded interrupt handler) and would allow for more generic usage of this 8086 (e.g. implenting an own simple BIOS or OS).
But first I want to implement all features correctly and add tests for all of them, before I want to move to that.
## Caveats
Code is currently not fetched from memory, but from a seperate vector, stored inside the Disassembler struct, which fetches and parses the next instruction from the instruction pointer.
Although, the `CS:IP` addressing scheme is still being used, to allow for 20-bit access, but does currently now allow for self-modifying code.
Also the disassmbler just uses an initial sweep for disassembly, which has a high probability of not being accurate, when compared to the runtime.
E.g. maybe there is a jump to a memory address during interpretation, which was not identified as an instruction by the disassembler.
## Documentation
The documentation of the project itself can be accessed by using `cargo doc`.
```
$ cargo doc
$ firefox target/doc/i8086_rs/index.html
```
For the implementation of the disassembly, I used the Intel "8086 16-BIT HMOS MICROPROCESSOR" Spec, as well as [this](http://www.mlsite.net/8086/8086_table.txt) overview of all Opcode variants used in conjunction with [this](http://www.mlsite.net/8086/) decoding matrix.
For the implementation of the interpreter, I used the Intel "Intel® 64 and IA-32 Architectures Software Developers Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" Spec.
## TODOs
- Map instructions into actual memory for interpretation
- Implement all Minix Interrupts
- Allow execution of 'raw' instructions, not only `a.out`
- Don't hardcode Minix
- Implement BIOS Interrupts
## FAQ
#### Why hassle with interpretation and not just emulate 8086?
For once, this project stemmed from a university exercise about the 8086 instruction set and disassembly.
An interpreter for these assembly instructions was the logical (?) next step.
Maybe I add raw 8086 emulation some day.
#### Why no `nom`?
There is no real reason, I just wanted to try to implement most parts myself, even if it meant more boilerplate code.
I used `nom` extensivly in the past and I just wanted to see what it would be like without that crate.
In hindsight, using `nom` would have been the cleaner option, but hey, something I only learned by not using `nom` for once.