OpenEnv Firmware Debug Environment
COMPLETEDA reinforcement-learning environment where AI agents debug realistic firmware faults on a simulated ARM Cortex-M — five scenarios with register-level STM32 modeling and cascading failures.
The idea
Most RL environments for AI agents are games or toy gridworlds. I wanted one built from the work I actually do: staring at STM32 registers at 2 a.m. trying to figure out why the UART is printing garbage. So I built an environment where AI agents debug embedded firmware faults the way an engineer does — by reading registers, logs, and RTOS state, forming a hypothesis, and poking the hardware.
What’s in it
The environment simulates an ARM Cortex-M (STM32) target at the register level: hardware register bitfields, clock trees, peripheral state, and RTOS task states are all modeled. The agent sees what a human debugger sees — register dumps, system logs, peripheral diagnostics — and acts by reading and writing registers.
There are five scenarios, each lifted from real embedded failure modes:
- UART baud-rate misconfiguration — the classic garbage-on-the-serial-port bug
- I2C bus fault — a hung bus that needs recognizing and recovering
- RTOS priority inversion — a low-priority task holding a mutex a high-priority task needs
- DMA cache-coherency violation — DMA and CPU disagreeing about what’s in memory
- Watchdog timeout — finding what’s starving the kick
Each one requires datasheet-level register reasoning to solve, not pattern matching.
The simulation is dynamic, not a quiz
The part I’m proudest of: when the agent writes a register, peripheral behavior actually mutates, logs change in response, and a careless write can trigger cascading failures, exactly as on real hardware. An agent that “fixes” the baud rate by breaking the clock tree finds out the same way a junior engineer would.
What it took
Modeling the failure modes faithfully took firmware experience; making them trainable took RL environment design. It’s the kind of bridge work between embedded systems and machine learning that I want to keep doing.