Demo Videos
Three stages of the system development, captured live.
Three stages of the system development, captured live.
A real-time vision-based tracking system that continuously centers a red target in the camera frame by driving a 2-axis motorized gimbal. The camera computes the pixel error between the target centroid and frame center, streams it over UART, and the STM32 converts that error into stepper motor commands — all running in a tight embedded control loop with no operating system.
The system is built entirely from scratch at the register level: no HAL, no CMSIS startup, no Arduino abstractions. Every peripheral — UART, GPIO, clock gating — is configured by writing directly to memory-mapped registers.
The project started as a 6-DOF robotic arm following the open-source Arctos Robotic Arm design, using MKS 42D stepper controllers over CAN bus. Three blockers killed it:
Rather than scrap everything, I isolated what was working — two axes of stepper motion with the DM542 drivers — and re-scoped the project around a problem where that hardware was sufficient: real-time pan/tilt tracking.
Engineering is not just building things — it's knowing when to cut scope and redirect effort. Salvaging two working axes into a complete, functional system is a better outcome than an incomplete 6-DOF arm.
PA0 — Motor 1 DIR (pan)PA1 — Motor 1 PUL (pan)PB0 — Motor 2 DIR (tilt)PB1 — Motor 2 PUL (tilt)PA9 / PA10 — USART1 TX/RX → OpenMV cameraPA2 / PA3 — USART2 TX/RX → PC debug terminalPA5 — onboard LED (activity indicator)
The wire protocol is a simple ASCII frame: "{dx},{dy}/\n" — for example
"35,-12/\n" means the target is 35 px right and 12 px above center.
The STM32 accumulates incoming bytes into a buffer and parses on the \n delimiter.
The OpenMV runs MicroPython at ~160×120 (HQVGA) in RGB565. On startup it performs a 5-second auto-calibration phase: the LED stays solid while it samples the center 20×20 px ROI and builds a LAB color threshold with ±15 margin on each channel. This makes the system robust to different lighting conditions — no hardcoded color values.
# 5-second calibration: sample center patch, build LAB threshold
led.on()
samples = []
cal_roi = (IMG_CENTER_X - 10, IMG_CENTER_Y - 10, 20, 20)
start_time = time.ticks_ms()
while time.ticks_diff(time.ticks_ms(), start_time) < 5000:
img = sensor.snapshot()
stats = img.get_statistics(roi=cal_roi)
samples.append((stats.l_mean(), stats.a_mean(), stats.b_mean()))
l_avg = sum(s[0] for s in samples) / len(samples)
a_avg = sum(s[1] for s in samples) / len(samples)
b_avg = sum(s[2] for s in samples) / len(samples)
MARGIN = 15
red_threshold = [(
int(l_avg - MARGIN), int(l_avg + MARGIN),
int(a_avg - MARGIN), int(a_avg + MARGIN),
int(b_avg - MARGIN), int(b_avg + MARGIN)
)]
led.off()
In the tracking loop, find_blobs() runs every frame. The largest blob by pixel
count is selected as the target. Its centroid is compared to the frame center to produce
the signed pixel error sent over UART.
# Tracking loop — find largest blob, compute signed pixel error
blobs = img.find_blobs(red_threshold, pixels_threshold=50,
area_threshold=50, merge=True)
if blobs:
target = max(blobs, key=lambda b: b.pixels())
last_dx = target.cx() - IMG_CENTER_X # positive = target right of center
last_dy = IMG_CENTER_Y - target.cy() # positive = target above center
img.draw_rectangle(target.rect(), color=(255, 0, 0))
img.draw_cross(target.cx(), target.cy(), color=(0, 255, 0))
uart.write("{},{}/\n".format(last_dx, last_dy))
Written entirely without HAL or CMSIS drivers. Every register address is defined manually
from the STM32F446 reference manual — GPIO, RCC, and USART are all accessed as
volatile uint32_t pointers at their physical memory-mapped addresses.
Clocks must be explicitly enabled before any peripheral can be used. GPIO pins are configured by writing 2-bit fields into MODER, and USART alternate functions are selected via AFRL/AFRH.
// Enable clocks for GPIOA, GPIOB, USART1 (APB2), USART2 (APB1)
RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOBEN;
RCC_APB2ENR |= RCC_APB2ENR_USART1EN;
RCC_APB1ENR |= RCC_APB1ENR_USART2EN;
// Set PA9/PA10 to Alternate Function mode (0b10), select AF7 = USART1
GPIOA_MODER |= (0x2UL << (9*2)) | (0x2UL << (10*2));
GPIOA_AFRH |= (0x7UL << ((9-8)*4)) | (0x7UL << ((10-8)*4));
// Configure baud rate: BRR = F_CLK / baud (with rounding)
USART1_BRR = (16000000UL + 115200/2) / 115200;
USART1_CR1 = (1UL<<13) | (1UL<<3) | (1UL<<2); // UE | TE | RE
The STM32 receives one byte at a time by polling the RXNE flag (bit 5 of SR). Characters
are accumulated in a 32-byte buffer. On \n, strchr() finds the
, and / delimiters and atoi() converts the substrings
to integers.
// Poll RXNE, buffer chars until newline, then parse "dx,dy/"
if (USART1_SR & (1UL << 5)) { // RXNE: new byte ready
char c = (char)USART1_DR;
if (c == '\n' || buf_index >= 31) {
buffer[buf_index] = '\0';
char *comma = strchr(buffer, ',');
char *slash = strchr(buffer, '/');
if (comma && slash) {
*comma = '\0'; *slash = '\0';
dx = atoi(buffer);
dy = atoi(comma + 1);
}
buf_index = 0;
control_motors(dx, dy); // act on new coordinate
} else if (c != '\r') {
buffer[buf_index++] = c;
}
}
Each DM542 driver advances one step on every LOW→HIGH→LOW transition of its PUL pin. The STM32 uses the BSRR register (atomic bit set/reset) to generate clean 10 µs pulses without needing to read-modify-write the ODR.
// One step pulse on Motor 1 (pan) — PA1
static void step_pulse_motor1(void) {
GPIOA_BSRR = (1UL << PUL1_PIN); // drive PUL high
delay_us(10); // hold ≥2.5 µs (DM542 min)
GPIOA_BSRR = (1UL << (PUL1_PIN + 16)); // drive PUL low → step taken
}
// Direction: write BSRR set bits for CW, reset bits for CCW
static void set_direction_motor1(uint8_t dir) {
if (dir == DIR_CW) GPIOA_BSRR = (1UL << DIR1_PIN);
else GPIOA_BSRR = (1UL << (DIR1_PIN + 16));
}
Raw pixel error from the camera is noisy frame-to-frame. An integer EMA filter (α = 3/8) smooths the signal. A dead zone of ±10 px prevents the motors from hunting when the target is nearly centered. Step speed is linearly interpolated between 1000 µs/step (slow, small error) and 150 µs/step (fast, large error), saturating at MAX_ERROR = 80 px.
// Integer EMA: ema = (3*new + 5*old) / 8 → α = 3/8 = 37.5%
ema_dx = (3 * dx + 5 * ema_dx) / 8;
ema_dy = (3 * dy + 5 * ema_dy) / 8;
// Speed: linearly map |error| → step delay (µs)
// error=10 → 1000 µs/step, error≥80 → 150 µs/step
static uint32_t error_to_delay(int32_t error) {
uint32_t e = abs(error);
if (e <= POSITION_THRESHOLD) return MAX_STEP_DELAY; // 1000 µs
if (e >= MAX_ERROR) return MIN_STEP_DELAY; // 150 µs
uint32_t span = MAX_ERROR - POSITION_THRESHOLD; // 70
return MAX_STEP_DELAY - (e - POSITION_THRESHOLD)
* (MAX_STEP_DELAY - MIN_STEP_DELAY) / span;
}
// Hard travel limit: ±1200 steps from power-on position
if (pos_x >= STEP_LIMIT) steps_x = 0; // block further CW motion
Naively running Motor 1 fully then Motor 2 produces staircase diagonal motion. Instead, a 50 µs software tick loop runs both axes simultaneously: each motor has an accumulator that increments by 50 µs per tick, and a step fires whenever the accumulator reaches that axis's target period. Both motors start and finish their move in the same wall-clock time, producing a smooth arc.
// Time-sliced concurrent stepping — 50 µs tick resolution
static void move_motors_concurrent(
uint32_t steps_x, uint8_t dir_x, uint32_t delay_x,
uint32_t steps_y, uint8_t dir_y, uint32_t delay_y)
{
set_direction_motor1(dir_x);
set_direction_motor2(dir_y);
uint32_t done_x=0, done_y=0, accum_x=0, accum_y=0;
while (done_x < steps_x || done_y < steps_y) {
delay_us(50);
if (done_x < steps_x) {
accum_x += 50;
if (accum_x >= delay_x) { step_pulse_motor1(); accum_x=0; done_x++; }
}
if (done_y < steps_y) {
accum_y += 50;
if (accum_y >= delay_y) { step_pulse_motor2(); accum_y=0; done_y++; }
}
}
}
There is no CMSIS startup file. startup.c defines the full ARM Cortex-M4
vector table (100+ entries for all STM32F446 IRQs) and the Reset_Handler
that runs before main(). All exception handlers default to an infinite loop
via weak aliases, so any unhandled fault halts visibly rather than running garbage code.
// Vector table placed at 0x08000000 (.isr_vector section)
__attribute__((section(".isr_vector")))
const uint32_t vector_table[] = {
(uint32_t)&_estack, // initial stack pointer (from linker script)
(uint32_t)Reset_Handler, // first instruction after power-on
(uint32_t)NMI_Handler,
(uint32_t)HardFault_Handler,
// ... all 100+ STM32F446 peripheral IRQs follow
};
// Reset_Handler: copy .data Flash→RAM, zero .bss, call main()
void Reset_Handler(void) {
uint32_t *src = &_sidata, *dst = &_sdata;
while (dst < &_edata) *dst++ = *src++; // initialized globals
dst = &_sbss;
while (dst < &_ebss) *dst++ = 0; // zero-init globals
main();
while (1); // trap if main() ever returns
}
The build uses arm-none-eabi-gcc targeting Cortex-M4 with hardware FPU
(fpv4-sp-d16, -mfloat-abi=hard), -O2, and
--gc-sections to strip unused code. The final binary is flashed via OpenOCD
+ ST-Link.
# Makefile excerpt
CC = arm-none-eabi-gcc
MCU = cortex-m4
FPU = fpv4-sp-d16
CFLAGS = -mcpu=$(MCU) -mthumb -mfpu=$(FPU) -mfloat-abi=hard
CFLAGS += -O2 -Wall -fdata-sections -ffunction-sections
LDFLAGS = -T STM32F446RE_FLASH.ld -specs=nano.specs
LDFLAGS += -Wl,--gc-sections -Wl,-Map=$(TARGET).map
flash:
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg \
-c "program $(TARGET).elf verify reset exit"
End-to-end data flow from camera capture through vision processing, UART communication, and motor control.