Real-Time Object Tracking Gimbal

STM32 + OpenMV  |  Bare-Metal Firmware  |  2-Axis NEMA Stepper Control

Demo Videos

Three stages of the system development, captured live.

Red Object Tracking + UART Feed
Single-Axis (X) Motor Response
Full 2-Axis Pan/Tilt Tracking

Overview

A real-time vision-based tracking system that continuously centers a red target in the camera frame by driving a 2-axis motorized gimbal. The camera computes the pixel error between the target centroid and frame center, streams it over UART, and the STM32 converts that error into stepper motor commands — all running in a tight embedded control loop with no operating system.

The system is built entirely from scratch at the register level: no HAL, no CMSIS startup, no Arduino abstractions. Every peripheral — UART, GPIO, clock gating — is configured by writing directly to memory-mapped registers.

STM32F446RE OpenMV (MicroPython) Bare-Metal C USART1 / USART2 NEMA Stepper DM542 Driver LAB Color Space EMA Filter 16 MHz HSI

Problem & Engineering Pivot

The project started as a 6-DOF robotic arm following the open-source Arctos Robotic Arm design, using MKS 42D stepper controllers over CAN bus. Three blockers killed it:

Key Decision

Rather than scrap everything, I isolated what was working — two axes of stepper motion with the DM542 drivers — and re-scoped the project around a problem where that hardware was sufficient: real-time pan/tilt tracking.

Engineering is not just building things — it's knowing when to cut scope and redirect effort. Salvaging two working axes into a complete, functional system is a better outcome than an incomplete 6-DOF arm.

System Architecture

Hardware

Pin Assignment

Communication Flow

OpenMV
blob detect
USART1
115200 baud
STM32F446RE
parse + control
GPIO STEP/DIR
PA0/1, PB0/1
DM542
×2
NEMA Motors
pan + tilt

The wire protocol is a simple ASCII frame: "{dx},{dy}/\n" — for example "35,-12/\n" means the target is 35 px right and 12 px above center. The STM32 accumulates incoming bytes into a buffer and parses on the \n delimiter.

Computer Vision (OpenMV)

The OpenMV runs MicroPython at ~160×120 (HQVGA) in RGB565. On startup it performs a 5-second auto-calibration phase: the LED stays solid while it samples the center 20×20 px ROI and builds a LAB color threshold with ±15 margin on each channel. This makes the system robust to different lighting conditions — no hardcoded color values.

# 5-second calibration: sample center patch, build LAB threshold
led.on()
samples = []
cal_roi = (IMG_CENTER_X - 10, IMG_CENTER_Y - 10, 20, 20)
start_time = time.ticks_ms()

while time.ticks_diff(time.ticks_ms(), start_time) < 5000:
    img = sensor.snapshot()
    stats = img.get_statistics(roi=cal_roi)
    samples.append((stats.l_mean(), stats.a_mean(), stats.b_mean()))

l_avg = sum(s[0] for s in samples) / len(samples)
a_avg = sum(s[1] for s in samples) / len(samples)
b_avg = sum(s[2] for s in samples) / len(samples)

MARGIN = 15
red_threshold = [(
    int(l_avg - MARGIN), int(l_avg + MARGIN),
    int(a_avg - MARGIN), int(a_avg + MARGIN),
    int(b_avg - MARGIN), int(b_avg + MARGIN)
)]
led.off()

In the tracking loop, find_blobs() runs every frame. The largest blob by pixel count is selected as the target. Its centroid is compared to the frame center to produce the signed pixel error sent over UART.

# Tracking loop — find largest blob, compute signed pixel error
blobs = img.find_blobs(red_threshold, pixels_threshold=50,
                        area_threshold=50, merge=True)
if blobs:
    target = max(blobs, key=lambda b: b.pixels())
    last_dx = target.cx() - IMG_CENTER_X   # positive = target right of center
    last_dy = IMG_CENTER_Y - target.cy()   # positive = target above center

    img.draw_rectangle(target.rect(), color=(255, 0, 0))
    img.draw_cross(target.cx(), target.cy(), color=(0, 255, 0))

uart.write("{},{}/\n".format(last_dx, last_dy))

Embedded Firmware (Bare-Metal STM32)

Written entirely without HAL or CMSIS drivers. Every register address is defined manually from the STM32F446 reference manual — GPIO, RCC, and USART are all accessed as volatile uint32_t pointers at their physical memory-mapped addresses.

1. Peripheral Initialization

Clocks must be explicitly enabled before any peripheral can be used. GPIO pins are configured by writing 2-bit fields into MODER, and USART alternate functions are selected via AFRL/AFRH.

// Enable clocks for GPIOA, GPIOB, USART1 (APB2), USART2 (APB1)
RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOBEN;
RCC_APB2ENR |= RCC_APB2ENR_USART1EN;
RCC_APB1ENR |= RCC_APB1ENR_USART2EN;

// Set PA9/PA10 to Alternate Function mode (0b10), select AF7 = USART1
GPIOA_MODER |= (0x2UL << (9*2)) | (0x2UL << (10*2));
GPIOA_AFRH  |= (0x7UL << ((9-8)*4)) | (0x7UL << ((10-8)*4));

// Configure baud rate: BRR = F_CLK / baud (with rounding)
USART1_BRR = (16000000UL + 115200/2) / 115200;
USART1_CR1 = (1UL<<13) | (1UL<<3) | (1UL<<2); // UE | TE | RE

2. UART Frame Parsing

The STM32 receives one byte at a time by polling the RXNE flag (bit 5 of SR). Characters are accumulated in a 32-byte buffer. On \n, strchr() finds the , and / delimiters and atoi() converts the substrings to integers.

// Poll RXNE, buffer chars until newline, then parse "dx,dy/"
if (USART1_SR & (1UL << 5)) {          // RXNE: new byte ready
    char c = (char)USART1_DR;
    if (c == '\n' || buf_index >= 31) {
        buffer[buf_index] = '\0';
        char *comma = strchr(buffer, ',');
        char *slash = strchr(buffer, '/');
        if (comma && slash) {
            *comma = '\0'; *slash = '\0';
            dx = atoi(buffer);
            dy = atoi(comma + 1);
        }
        buf_index = 0;
        control_motors(dx, dy);         // act on new coordinate
    } else if (c != '\r') {
        buffer[buf_index++] = c;
    }
}

3. Stepper Motor Pulse Generation

Each DM542 driver advances one step on every LOW→HIGH→LOW transition of its PUL pin. The STM32 uses the BSRR register (atomic bit set/reset) to generate clean 10 µs pulses without needing to read-modify-write the ODR.

// One step pulse on Motor 1 (pan) — PA1
static void step_pulse_motor1(void) {
    GPIOA_BSRR = (1UL << PUL1_PIN);          // drive PUL high
    delay_us(10);                              // hold ≥2.5 µs (DM542 min)
    GPIOA_BSRR = (1UL << (PUL1_PIN + 16));   // drive PUL low → step taken
}

// Direction: write BSRR set bits for CW, reset bits for CCW
static void set_direction_motor1(uint8_t dir) {
    if (dir == DIR_CW) GPIOA_BSRR =  (1UL << DIR1_PIN);
    else               GPIOA_BSRR =  (1UL << (DIR1_PIN + 16));
}

4. Control Algorithm

Raw pixel error from the camera is noisy frame-to-frame. An integer EMA filter (α = 3/8) smooths the signal. A dead zone of ±10 px prevents the motors from hunting when the target is nearly centered. Step speed is linearly interpolated between 1000 µs/step (slow, small error) and 150 µs/step (fast, large error), saturating at MAX_ERROR = 80 px.

// Integer EMA: ema = (3*new + 5*old) / 8  →  α = 3/8 = 37.5%
ema_dx = (3 * dx + 5 * ema_dx) / 8;
ema_dy = (3 * dy + 5 * ema_dy) / 8;

// Speed: linearly map |error| → step delay (µs)
// error=10 → 1000 µs/step,  error≥80 → 150 µs/step
static uint32_t error_to_delay(int32_t error) {
    uint32_t e = abs(error);
    if (e <= POSITION_THRESHOLD) return MAX_STEP_DELAY;  // 1000 µs
    if (e >= MAX_ERROR)           return MIN_STEP_DELAY;  // 150 µs
    uint32_t span = MAX_ERROR - POSITION_THRESHOLD;       // 70
    return MAX_STEP_DELAY - (e - POSITION_THRESHOLD)
           * (MAX_STEP_DELAY - MIN_STEP_DELAY) / span;
}

// Hard travel limit: ±1200 steps from power-on position
if (pos_x >= STEP_LIMIT) steps_x = 0;  // block further CW motion

5. Concurrent Dual-Axis Scheduling

Naively running Motor 1 fully then Motor 2 produces staircase diagonal motion. Instead, a 50 µs software tick loop runs both axes simultaneously: each motor has an accumulator that increments by 50 µs per tick, and a step fires whenever the accumulator reaches that axis's target period. Both motors start and finish their move in the same wall-clock time, producing a smooth arc.

// Time-sliced concurrent stepping — 50 µs tick resolution
static void move_motors_concurrent(
    uint32_t steps_x, uint8_t dir_x, uint32_t delay_x,
    uint32_t steps_y, uint8_t dir_y, uint32_t delay_y)
{
    set_direction_motor1(dir_x);
    set_direction_motor2(dir_y);
    uint32_t done_x=0, done_y=0, accum_x=0, accum_y=0;

    while (done_x < steps_x || done_y < steps_y) {
        delay_us(50);
        if (done_x < steps_x) {
            accum_x += 50;
            if (accum_x >= delay_x) { step_pulse_motor1(); accum_x=0; done_x++; }
        }
        if (done_y < steps_y) {
            accum_y += 50;
            if (accum_y >= delay_y) { step_pulse_motor2(); accum_y=0; done_y++; }
        }
    }
}

Low-Level System Initialization

There is no CMSIS startup file. startup.c defines the full ARM Cortex-M4 vector table (100+ entries for all STM32F446 IRQs) and the Reset_Handler that runs before main(). All exception handlers default to an infinite loop via weak aliases, so any unhandled fault halts visibly rather than running garbage code.

// Vector table placed at 0x08000000 (.isr_vector section)
__attribute__((section(".isr_vector")))
const uint32_t vector_table[] = {
    (uint32_t)&_estack,          // initial stack pointer (from linker script)
    (uint32_t)Reset_Handler,      // first instruction after power-on
    (uint32_t)NMI_Handler,
    (uint32_t)HardFault_Handler,
    // ... all 100+ STM32F446 peripheral IRQs follow
};

// Reset_Handler: copy .data Flash→RAM, zero .bss, call main()
void Reset_Handler(void) {
    uint32_t *src = &_sidata, *dst = &_sdata;
    while (dst < &_edata) *dst++ = *src++;   // initialized globals

    dst = &_sbss;
    while (dst < &_ebss) *dst++ = 0;          // zero-init globals

    main();
    while (1);  // trap if main() ever returns
}

The build uses arm-none-eabi-gcc targeting Cortex-M4 with hardware FPU (fpv4-sp-d16, -mfloat-abi=hard), -O2, and --gc-sections to strip unused code. The final binary is flashed via OpenOCD + ST-Link.

# Makefile excerpt
CC     = arm-none-eabi-gcc
MCU    = cortex-m4
FPU    = fpv4-sp-d16

CFLAGS = -mcpu=$(MCU) -mthumb -mfpu=$(FPU) -mfloat-abi=hard
CFLAGS += -O2 -Wall -fdata-sections -ffunction-sections

LDFLAGS = -T STM32F446RE_FLASH.ld -specs=nano.specs
LDFLAGS += -Wl,--gc-sections -Wl,-Map=$(TARGET).map

flash:
    openocd -f interface/stlink.cfg -f target/stm32f4x.cfg \
        -c "program $(TARGET).elf verify reset exit"

System Flow Diagram

End-to-end data flow from camera capture through vision processing, UART communication, and motor control.

Vision Tracking System Flow Diagram