AYUSH~KADALI DOC AK-2026 · REV A
← project index

OpenEnv Firmware Debug Environment

COMPLETED

A reinforcement-learning environment where AI agents debug realistic firmware faults on a simulated ARM Cortex-M — five scenarios with register-level STM32 modeling and cascading failures.

Python · Reinforcement Learning · ARM Cortex-M · STM32 Simulation · RTOS · Gym/OpenEnv

5 production-realistic debug scenarios

The idea

Most RL environments for AI agents are games or toy gridworlds. I wanted one built from the work I actually do: staring at STM32 registers at 2 a.m. trying to figure out why the UART is printing garbage. So I built an environment where AI agents debug embedded firmware faults the way an engineer does — by reading registers, logs, and RTOS state, forming a hypothesis, and poking the hardware.

What’s in it

The environment simulates an ARM Cortex-M (STM32) target at the register level: hardware register bitfields, clock trees, peripheral state, and RTOS task states are all modeled. The agent sees what a human debugger sees — register dumps, system logs, peripheral diagnostics — and acts by reading and writing registers.

There are five scenarios, each lifted from real embedded failure modes:

  • UART baud-rate misconfiguration — the classic garbage-on-the-serial-port bug
  • I2C bus fault — a hung bus that needs recognizing and recovering
  • RTOS priority inversion — a low-priority task holding a mutex a high-priority task needs
  • DMA cache-coherency violation — DMA and CPU disagreeing about what’s in memory
  • Watchdog timeout — finding what’s starving the kick

Each one requires datasheet-level register reasoning to solve, not pattern matching.

The simulation is dynamic, not a quiz

The part I’m proudest of: when the agent writes a register, peripheral behavior actually mutates, logs change in response, and a careless write can trigger cascading failures, exactly as on real hardware. An agent that “fixes” the baud rate by breaking the clock tree finds out the same way a junior engineer would.

What it took

Modeling the failure modes faithfully took firmware experience; making them trainable took RL environment design. It’s the kind of bridge work between embedded systems and machine learning that I want to keep doing.