STM32 GPIOs and Template Metaprogramming


The STM32 is a (relatively) new microcontroller made by ST, based on the Cortex-M3 ARM CPU core.
I've recently started coding for this microcontroller, and I'm positively impressed by its speed and peripheral set. However, the software side isn't as nice as the hardware side. The problem is that its peripherals are very flexible, but this also means that are somewhat difficult to configure properly. This is true even for the core peripherals such as GPIOs.
For example, here is the code to configure GPIO 6 of port F as output:

GPIOF->CRL &= ~(0xf<<(6*4));

This is because a GPIO has 4 bits defining its state that can be one of the following input, input with pull-up, input with pull-down, output push pull high speed (50MHz max), output push pull medium speed (10MHz max), output push pull low speed (2MHz max), output open drain high speed, and many other...
Luckily, the code to set an output pin to 0 or 1 is more straightforward:

GPIOA->BSRR= 1<<15;//Set pin A.15 to 1
GPIOA->BRR= 1<<15;//Set pin A.15 to 0

The problem

Now, thinking carefully, the problem is not with features, since having more features is good. The problem is that the code above, both to configure and set a pin, is not self documenting. When you write it you (hopefully) know what it means, but 6 months later if you need to read the code again will you still remember what it does?
Having sensed that, ST decided to provide a peripheral library to ease the configuration of peripherals. So here is how it looks the code to configure and set a GPIO pin using the ST peripheral library:

GPIO_InitTypeDef GPIO_InitStructure;
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_15;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
GPIO_Init(GPIOA, &GPIO_InitStructure);
GPIO_WriteBit(GPIOA, GPIO_Pin_15, (BitAction)1);

No doubt this code is more self documenting,  but has its weaknesses too. First of all, it is verbose. To configure a pin you need 5 lines of code involving instantiating a struct, setting its fields an passing it to a function (probably because otherwise the function would have had too many parameters to be remembered). Since some stm32 microcontrollers have more than 100 GPIOs, the problem is evident. Even the GPIO_WriteBit() syntax is verbose, and has the problem that the code refers to the implementation domain rather than the application domain. For example, if there was a led connected to portA15 there is nothing in code that states that. A more insidious weakness is performance. Setting and clearing bits is an action that occurs many times in embedded development, often in performance critical inner loops. Having to call GPIO_WriteBit() to do that means that every GPIO operation incurs the cost of a function call.

A solution

A better solution can start from inline functions, to improve performance, and keep their name short to avoid verbosity. Like this:

inline void portA0_high()
GPIOA->BSRR= 1<<0;

inline void portA0_low()
GPIOA->BRR= 1<<0;

This would provide optimal performance, but it's still far from perfect. The problem is that there is the need to write at least four function per GPIO pin, one to configure its mode (input, output, ...) one to read its value if it is an input, one to write 1 and one to write 0. This multiplied by the number of GPIOs. Too many functions to write.
Ideally what's needed here is a way to say to the compiler "Do it for me, instantiate the functions automatically as needed". This is exactly what C++'s template metaprogramming is about.
Having already decided to program the stm32 in C++ (as opposed to C) I wrote a template class to handle GPIOs, which is included at the end of this article.

The class' interface is this:

template<unsigned int P, unsigned char N>
class Gpio: private GpioBase
    static void mode(Mode::Mode_ m);
    static void high();
    static void low();
    static int value();
    static void pullup();
    static void pulldown();

The class accepts two template parameters, P which is the port (use the constants GPIOA_BASE, GPIOB_BASE, ...) defined in stm32f10x.h and N which is the pin number, 0 to 15. The member functions allow to set its mode (INPUT, OUTPUT, ... for a complete list see the enum Mode_ in the same header file where the class is defined). The high() and low() member functions allow to set the pin to 1 or 0, value() is for reading the pin if it is an input, and pullup() and pulldown() are to set pullup/pulldown mode if the pin is configured as Mode::INPUT_PULL_UP_DOWN.
Since all member functions are static you don't need to allocate the class, and this reduces RAM usage to zero.
The intended use is to create a typedef that specifies the template parameters with a name meaningful for your application, so that the code  is really self documenting. For example, assuming your application has a led connected to portA0 and a button connected to portB10, this is an example code:

#include "gpio.h"

typedef Gpio<GPIOA_BASE,0>  led;
typedef Gpio<GPIOB_BASE,10> button;

int main()
        if(button::value()==1) led::high(); else led::low();

Of course if you need to access the typedef from more source files,  they can be defined in a header file and #included.


Ok, now the syntax problem is solved,  but how about performance? The member functions high(), low() and value() have been optimized so that they don't incur in any function call overhead. To prove it, the following code was compiled with g++ 4.4.2, optimization -O2:

void set_pin_a()
    GPIOA->BSRR= 1<<15;

void set_pin_b()
    GPIO_WriteBit(GPIOA, GPIO_Pin_15, (BitAction)1);

typedef Gpio<GPIOA_BASE,15> portA15;

void set_pin_c()

And this is the disassembly:

00000000 <_Z9set_pin_av>:
   0:    f640 0300     movw    r3, #2048    ; 0x800
   4:    f2c4 0301     movt    r3, #16385    ; 0x4001
   8:    f44f 4200     mov.w    r2, #32768    ; 0x8000
   c:    611a          str    r2, [r3, #16]
   e:    4770          bx    lr

00000000 <_Z9set_pin_bv>:
   0:    f640 0000     movw    r0, #2048    ; 0x800
   4:    b500          push    {lr}
   6:    f2c4 0001     movt    r0, #16385    ; 0x4001
   a:    f44f 4100     mov.w    r1, #32768    ; 0x8000
   e:    2201          movs    r2, #1
  10:    f7ff fffe     bl    0 <GPIO_WriteBit>
  14:    bd00          pop    {pc}
  16:    46c0          nop            (mov r8, r8)

00000000 <_Z9set_pin_cv>:
   0:    f640 0300     movw    r3, #2048    ; 0x800
   4:    f2c4 0301     movt    r3, #16385    ; 0x4001
   8:    f44f 4200     mov.w    r2, #32768    ; 0x8000
   c:    611a          str    r2, [r3, #16]
   e:    4770          bx    lr

As can be seen, the template implementation generates the same assembly code as the hardcoded modification of GPIOA->BSRR, and does not incur in the overhead of a function call as does the STM32-std-peripheral-lib code.

The code

Here is the code of the Gpio template class: gpio.h (Version 1.3)

Update Dec 26, 2009

I received a comment in the blog post associated with this article regarding a possible improvement. The patch allows to inline the call to Gpio::mode() and resolve a branch at compile time, increasing code speed. The patch has been applied.

Update Dec 24, 2009

The code has been updated with both bugfixes and usability improvement. As already said, there is no need to create instances of the Gpio class since it only has static member functions, but there was nothing to prevent this. Since it is always better to state how a class should be used in code, rather than in comments, I made the class' constructor private, and in this way an attempt to create an instance of the Gpio class results in a compier error.