Building U-Boot and Running Custom C++ Code on Orange Pi Zero 2W
For one of my recent experiments, I needed to investigate how much the Linux operating system actually impacts hardware performance. To do this, I had to test the same algorithm in two different environments: under Linux control and in bare-metal mode (without an OS). Beyond the data, I was driven by pure curiosity—it was fascinating to compare programming for microcontrollers versus “grown-up” 64-bit multi-core processors.
For these tests, I chose the Allwinner H618 SoC:
Ubiquity and Accessibility: It’s incredibly easy to find. My home embedded lab has about a dozen devices powered by this chip, ranging from TV boxes to various SBCs (Orange Pi, Walnut Pi, etc.). Back in the summer of 2025, H618-based boards could be snatched up for $15-$20, though prices have noticeably climbed since then.
The Power-Periphery Balance: With four Cortex-A53 cores, you can hardly call this processor “sluggish.” It’s snappy enough for serious tasks. Working with UART, SPI, I2C, or timers here is straightforward and well-documented - if not in official manuals, then certainly within the linux-sunxi community.
To run custom code on such hardware, you first need to understand its “wake-up” routine. Unlike microcontrollers, where code execution usually starts directly from Flash memory, the boot process of a complex SoC is a multi-stage quest.
The Boot Process
When power is applied, the processor begins executing a program hardcoded into the BootROM (also known as MaskROM). This code is baked into the chip’s silicon during manufacturing and cannot be changed. The primary goal of the BootROM is to find the SPL (Secondary Program Loader) on external media (SD card, eMMC, or NAND), load it into the SRAM cache, and hand over control.
The SPL starts its work while the RAM is not yet initialized (the DRAM controller is unconfigured). Its main responsibilities are:
Initializing critical peripherals (DRAM controller, clock system/PLL, and UART console).
Loading the subsequent components — TF-A (Trusted Firmware-A) and U-Boot — into the now-initialized DRAM.
Handing over control to the TF-A layer.
TF-A (formerly known as ATF - ARM Trusted Firmware) is responsible for setting up processor security, TrustZone, power management (PSCI), and more. This is a fairly broad and complex topic that deserves its own deep dive, so we won’t go into the weeds here. Once the secure environment is initialized, TF-A passes control to U-Boot.
Why U-Boot?
U-Boot is an open-source project that has become the de facto standard for embedded systems. Originally developed for the PowerPC architecture, it has evolved into a universal bootloader for Arm, RISC-V, MIPS, and other platforms.
U-Boot significantly expands how we interact with hardware:
Interactive Command Line (CLI): It allows for flexible boot scenario configuration, memory state inspection, and real-time peripheral management.
Extended Peripheral Support: U-Boot can interact with devices typically not supported at the BootROM level. For example, it can load code via USB, Ethernet (NFS/PXE), or even from an NVMe drive.
Filesystem Awareness: It understands FAT, ext4, and other systems. This means we can simply copy our binary onto a card rather than writing it to raw sectors.
Preparing to Build U-Boot
To build U-Boot, you’ll need Linux in some form. This can be your primary OS, a virtual machine, or a remote VPS. As for WSL2 on Windows—theoretically, it should work flawlessly, though I haven’t personally tested that environment yet.
Once the image is built, you can flash it to a MicroSD card on any
system that has the dd utility available. For example, I
wrote part of this guide on an Arch Linux desktop and the other part on
a MacBook Pro, using an Ubuntu VPS for the actual compilation and then
transferring the final binary for flashing.
For Debian/Ubuntu, install the following packages:
sudo apt update
sudo apt install build-essential git bison flex libssl-dev \
gcc-aarch64-linux-gnu swig python3-dev bc device-tree-compiler \
libgnutls28-dev
For Arch Linux:
sudo pacman -S base-devel git openssl aarch64-linux-gnu-gcc \
swig python-setuptools dtc bc gnutls
Note: The ґgcc-aarch64-linux-gnu packageґ is the cross-compiler specifically for the ARM64 platform.
Building Trusted Firmware-A
While the SPL and U-Boot code reside in the same repository, TF-A (Trusted Firmware-A) is a separate project that must be cloned and built independently. The official source code is available on GitHub:
https://github.com/TrustedFirmware-A/trusted-firmware-a.git
Occasionally, chip manufacturers (like Rockchip) maintain their own TF-A or U-Boot forks containing patches that haven’t been merged into the upstream yet. However, for the H618, the standard TF-A repository is perfectly sufficient.
So, let’s clone the repository and proceed with the build:
# Clone and enter the directory
git clone https://github.com/TrustedFirmware-A/trusted-firmware-a.git
cd trusted-firmware-a/
# Set the cross-compiler prefix
export CROSS_COMPILE=aarch64-linux-gnu-
# Start the build. For the H618, we use the sun50i_h616 platform
make PLAT=sun50i_h616 DEBUG=1 bl31
The compilation is quite fast. Once it’s finished, you will find the
necessary file at: build/sun50i_h616/debug/bl31.bin.
Important Note: I once made a mistake with the PLAT
setting—either following a flawed guide or just being careless—and
specified sun50i_a64. Everything compiled without errors,
and U-Boot built fine too, but during boot, the system simply “vanished
into the void” immediately after the SPL phase. I spent a significant
amount of time troubleshooting before I finally spotted that annoying
error. After rebuilding with the correct parameter, I finally gained
access to the U-Boot console.
Building SPL and U-Boot
First, clone the official repository from GitHub: https://github.com/u-boot/u-boot
# Clone and enter the directory
git clone https://github.com/u-boot/u-boot
cd u-boot/
# Set up environment variables for cross-compilation
export ARCH=arm64
export CROSS_COMPILE=aarch64-linux-gnu-
U-Boot comes with predefined configurations for various single-board
computers, which you can find in the configs/ directory. We
are specifically looking for Orange Pi
configurations:
$ ls -1 configs/orangepi*
configs/orangepi-3b-rk3566_defconfig
configs/orangepi-5-max-rk3588_defconfig
...
configs/orangepi_zero2_defconfig
configs/orangepi_zero2w_defconfig
configs/orangepi_zero3_defconfig
configs/orangepi_zero_defconfig
configs/orangepi_zero_plus2_defconfig
configs/orangepi_zero_plus2_h3_defconfig
configs/orangepi_zero_plus_defconfig
In this example, I’ll be using the Orange Pi Zero 2W, so I initialize the configuration with the following command:
make orangepi_zero2w_defconfig
This generates a .config file containing the parameters
for the upcoming build. Now, we can start the compilation process.
Crucially, you must specify the path to the TF-A (Trusted Firmware-A)
binary using the BL31 parameter:
make BL31=../trusted-firmware-a/build/sun50i_h616/debug/bl31.bin -j$(nproc)
Once the operation completes successfully, the final artifact,
u-boot-sunxi-with-spl.bin, will appear in the U-Boot root
directory:
$ ls -l u-boot-sunxi-with-spl.bin
-rw-r--r-- 1 mrco mrco 890309 Mar 18 19:22 u-boot-sunxi-with-spl.bin
Potential Issues & Troubleshooting
Image ‘u-boot-sunxi-with-spl’ is missing external blobs and is non-functional: atf-bl31 - This means you either forgot to provide the path or provided an incorrect path to the
bl31.bin(TF-A) file.Missing Utilities — If the build terminates with a “command not found” error, double-check that all dependencies (such as swig, python3-dev, bison, and flex) are properly installed.
OS Version Dependencies — Unofficial forks (as I experienced with Luckfox) often require a specific, sometimes legacy, environment. If compilation fails on a cutting-edge system, the simplest solution is to use Docker or a virtual machine running an older version of Ubuntu.
Flashing the Bootloader to an SD Card
The Allwinner BROM searches for the bootloader (SPL) on the card at a fixed offset of 8 KB (sector 16).
Warning! Be extremely careful when selecting the
target disk. A mistake in the device name (/dev/sdX or
rdiskN) can lead to total data loss on an important
drive.
First, identify the path to your SD card:
## MacOS
diskutil list
## Linux
lsblk
To avoid conflicts with existing partition tables or stale data, zero out the first 10 MB of the card:
## linux
sudo dd if=/dev/zero of=/dev/sdX bs=1M count=10
## macos
sudo dd if=/dev/zero of=/dev/rdiskX bs=1m count=10 status=progress
Now, write the u-boot-sunxi-with-spl.bin file with an 8
KB offset (using seek=8 with a 1024-byte block size):
## macos
sudo dd if=u-boot-sunxi-with-spl.bin of=/dev/rdiskX bs=1024 seek=8 status=progress
## linux
sudo dd if=u-boot-sunxi-with-spl.bin of=/dev/sdX bs=1024 seek=8 conv=fsync
Once the write is complete, you can remove the card. Since we didn’t create or mount a filesystem, no additional unmounting is required.
Connect your UART adapter to the GND, TX, and RX pins (UART0) of the Orange Pi Zero 2W and open a serial console using screen at a baud rate of 115200:
screen /dev/ttyXXX 115200
If everything was done correctly, upon powering up the device, you should see a log confirming the successful loading of the SPL, TF-A, and U-Boot:
U-Boot SPL 2026.04-rc4-00006-geefb822fb574 (Mar 18 2026 - 16:21:53 +0200)
DRAM: 2048 MiB
Trying to boot from MMC1
NOTICE: BL31: v2.14.0(debug):sandbox/v2.14-755-g2adf0f434
NOTICE: BL31: Built : 17:13:38, Mar 15 2026
NOTICE: BL31: Detected Allwinner H616 SoC (1823)
NOTICE: BL31: Found U-Boot DTB at 0x4a0b8d68, model: OrangePi Zero 2W
INFO: ARM GICv2 driver initialized
INFO: Configuring SPC Controller
INFO: Probing for PMIC on I2C:
INFO: PMIC: found AXP313
INFO: BL31: Platform setup done
INFO: BL31: Initializing runtime services
INFO: BL31: cortex_a53: CPU workaround for erratum 855873 was applied
INFO: BL31: cortex_a53: CPU workaround for erratum 1530924 was applied
INFO: PSCI: Suspend is unavailable
INFO: BL31: Preparing for EL3 exit to normal world
INFO: Entry point address = 0x4a000000
INFO: SPSR = 0x3c9
INFO: Changed devicetree.
U-Boot 2026.04-rc4-00006-geefb822fb574 (Mar 18 2026 - 16:21:53 +0200) Allwinner Technology
CPU: Allwinner H616 (SUN50I)
Model: OrangePi Zero 2W
DRAM: 2 GiB
Core: 63 devices, 24 uclasses, devicetree: separate
WDT: Not starting watchdog@30090a0
MMC: mmc@4020000: 0
Loading Environment from FAT... Unable to use mmc 0:0...
In: serial@5000000
Out: serial@5000000
Err: serial@5000000
Allwinner mUSB OTG (Peripheral)
Net: using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
MAC de:ad:be:ef:00:01
HOST MAC de:ad:be:ef:00:00
RNDIS ready
eth0: usb_ether
starting USB...
USB EHCI 1.00
USB OHCI 1.0
Bus usb@5200000: 1 USB Device(s) found
Bus usb@5200400: 1 USB Device(s) found
scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot: 0
=>
C++ Hello World: The Bare-Metal Way
Time for the final touch: running a C++ program in bare-metal mode—directly on the hardware without any underlying operating system.
Since U-Boot is primarily written in C and Assembly, we only needed a C compiler until now. To build C++ code for the Arm64 architecture, you’ll need to install an additional package:
sudo apt install g++-aarch64-linux-gnu
The H618 processor utilizes Memory-Mapped Input/Output (MMIO). This means that to output data to the UART, we simply need to write bytes to a specific physical address in memory.
To find this address, we need official documentation. While a dedicated manual for the H618 isn’t easily found in the public domain, the manual for the nearly identical H616 works perfectly: H616 User Manual V1.0.
According to the manual (Section 9.2.5, Register List), the UART0 peripheral is mapped to the address range 0x05000000 - 0x050003FF.
For basic text output, we only need two registers:
Transmit Holding Register (THR) (Offset 0x00): Anything written here is sent out over the UART line.
Line Status Register (LSR) (Offset 0x14): We are interested in the 5th bit—TX Holding Register Empty (THRE). Since the CPU is much faster than the UART interface, we must loop and check this bit before writing each byte to ensure the transmitter buffer is ready.
The Code: HelloWorld.cpp
#include <cstdint>
#define UART0_BASE 0x05000000
#define UART0_THR (*(volatile uint32_t*)(UART0_BASE + 0x00))
#define UART0_LSR (*(volatile uint32_t*)(UART0_BASE + 0x14))
#define LSR_THRE (1 << 5)
void uart_putc(char c) {
while ((UART0_LSR & LSR_THRE) == 0);
UART0_THR = c;
}
void uart_print(const char* str) {
while (*str) {
if (*str == '\n') uart_putc('\r');
uart_putc(*str++);
}
}
extern "C" __attribute__((section(".text.boot"))) void _start() {
uart_print("\n\n");
uart_print("================================\n");
uart_print(" Hello from C++ \n");
uart_print("================================\n");
// Let's wait a bit
for (volatile int i = 0; i < 10000000; i++) {}
uart_print("Returning control back to U-Boot...\n\n");
// return to the U-Boot
return;
}In Allwinner H616/H618 processors, the system RAM (DRAM) starts at the physical address 0x40000000 (as per Section 3.1 Memory Mapping).
Any address lower than this (like our UART at 0x05000000) points to peripheral registers or internal SRAM, not the RAM. We shouldn’t load our code exactly at 0x40000000 because the first few megabytes are usually reserved by U-Boot for its own MMU page tables, global stack, and other internal structures.
To play it safe, we follow the ARM64 convention of loading the kernel with a 2 MB offset—at address 0x40200000. Here is our linker script (linker.ld):
ENTRY(_start)
SECTIONS
{
. = 0x40200000;
.text : {
*(.text.boot)
*(.text*)
}
.rodata : { *(.rodata*) }
.data : { *(.data*) }
.bss : {
__bss_start = .;
*(.bss*)
__bss_end = .;
}
}
We use the cross-compiler with specific flags. Since we are running on “bare metal” we must disable standard libraries, exception handling, and RTTI. It’s not that the CPU can’t handle them; it’s because our tiny environment lacks the underlying runtime (memory management and error handling) that these C++ features depend on.
# Compile the object file
aarch64-linux-gnu-g++ -ffreestanding -fno-exceptions -fno-rtti -c helloworld.cpp -o helloworld.o
# Link according to our memory map
aarch64-linux-gnu-ld -T linker.ld helloworld.o -o helloworld.elf
# Create a raw binary image (stripping ELF headers)
aarch64-linux-gnu-objcopy -O binary helloworld.elf helloworld.bin
Since U-Boot occupies about 800 KB at the beginning of the card, we will write our binary at a 1 MB offset. This equals exactly 2048 (0x0800) sectors (assuming 512-byte sectors).
sudo dd if=helloworld.bin of=/dev/XXX bs=512 seek=2048 conv=notrunc
Power up the board, interrupt the autoboot by pressing any key, and enter the following U-Boot commands:
mmc dev 0- Select the SD card.mmc read 0x40200000 0x0800 0x1— Read 1 sector from the 1 MB offset (0x800) into RAM at our base address.go 0x40200000- Jump to the code.
Console Output:
Mission accomplished! We’ve successfully written and executed a C++ “Hello World” on bare metal. This was made significantly easier by U-Boot, which handled the “heavy lifting” of initializing the DRAM controller and CPU clocks, allowing us to focus purely on our program’s logic.


Comments
Post a Comment