Thingsquare firmware good to know, and dos and don'ts

The Thingsquare firmware is highly integrated and there are many things happening that the application writer doesn't need to handle, but may be interested in knowing that they are handled and how. There are also a number of pitfalls that may be easy to fall into. This post aims to highlight some of these things, and give advice on how to write an efficient application and drivers, while not disturbing components such as the network stack.

Watchdog timer and long-running operations

About watchdog timers

A watchdog timer (WDT) is a component typically implemented as a separate hardware peripheral in the CPU. The purpose is to ensure the device will always recover from certain types of software bugs, such as lock-ups, deadlocks/livelocks and so on.

The WDT is a free-running timer that when it reaches zero will trigger a hard reset of the CPU. Once enabled, the software running on the CPU must from time to time reset the WDT timer value, so that the software is given more time to run. If it fails, such as in the case of the software blocking waiting for something that will never happen, the WDT will reach zero and reset. This is beneficial as it is always better to start over, than to stop and wait forever for something that will never happen.

About the operating system

The underlying firmware operating system (Contiki) is not pre-emptive, hence it's easy to starve the rest of the system by not handling back control once and awhile. Handling back control is mainly through good code discipline. The main things to do to resolve this before it is an issue is to,

  • Keep Interrupt Service Routines (ISR) short. Not more than necessary should be performed in an ISR. Use process_poll(&process_to_poll); to indicate to other parts of the code that there is more to be done, eg read out a sensor.
  • Divide long-running processes into smaller parts, and give back control using the PROCESS constructs (PROCESS_YIELD(), PROCESS_WAIT_EVENT_UNTIL() etc). Some functionality is better suited as a process since it makes it easier to use the constructs, and easier to follow the logic flow compared with eg callbacks.

WDT in Thingsquare firmware

The watchdog is always enabled. The underlying OS and some drivers handle resetting the WDT during long-running operations. The application should never need to have to do this, unless the application itself needs long-running operations.

The time from a reset of the WDT to when it triggers is 1-2 seconds, depending on what CPU is used. However, an application should never occupy the CPU that long. Instead, care should be taken to split up the operation into components and use the PROCESS_ and timers constructs to wait. The reason is not just risking a WDT reboot, but also for to conserve power, and to not block the radio and network components from working correctly.

As a rough guide, if your driver or application blocks more than ca 25-50 milliseconds, you should look into dividing work into parts with non-blocking wait constructs in between, eg using a PROCESS .

Example

If your application or driver blocks for a long time, it is typically waiting for some external event to happen. It could be the time from when starting a sensor to when data is available to read out.

Here follows a quick mocked up example on how such a division could be made. First, we have the blocking wait version that in this case waits for a sensor to be ready.

static uint32_t
get_sensor_reading(void)
{
  /* mock example, we should write the driver to check for and handle errors */
  /* boot sensor and wait for it to be booted up. Wait is 100us-1ms */
  boot_sensor();
  while(!sensor_booted());

  /* start and wait for conversion to finish. Takes 70-120 ms. */
  start_sensor_conversion();
  while(sensor_is_converting());  /* BAD, takes too long time */

  return sensor_read_converted_data();
}

As we can see, the simple example is not unreasonable. Many sensors need on the order of hundreds of milliseconds to prepare data, especially when higher accuracy is required or sample averaging.

Now, to convert this to a non-blocking wait, we also need to tweak how it is invoked since we have no data to immediately return after starting this. We can choose to either send events through the use of int process_post(struct process *p, process_event_t ev, process_data_t data); (see process.h header file), or using callbacks.

One better example is as follows. The actual driver uses a process to handle the waiting, and posts an event when it is done. Processes interested in this sensor data may start the sensor up, wait for this event and then read out the data. The example is short for clarity, a real driver should include safe guards against the sensor failing, timeouts, and so on.

/*---------------------------------------------------------------------------*/
static process_event_t air_sensor_ready_event = 0;
static int processing = 0;
/*---------------------------------------------------------------------------*/
PROCESS_THREAD(sensor_read_process, ev, data)
{
  PROCESS_BEGIN();
  static struct etimer et;

  /* boot sensor and wait for it to be booted up. Wait is 100us-1ms */
  processing = 1;
  boot_sensor();
  while(!sensor_booted());

  /* start and wait for conversion to finish. Takes 70-120 ms. */
  start_sensor_conversion();
  while(sensor_is_converting()) {
    etimer_set(&et, CLOCK_SECOND / 20);
    PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));
  }

  /* send the event to all processes that the sensor data is ready */
  process_post(PROCESS_BROADCAST, air_sensor_ready_event, NULL);
  PROCESS_END();
}
/*---------------------------------------------------------------------------*/
void
sensor_start_read(void)
{
  if(air_sensor_ready_event == 0) {
    /* get an event number, only run this once */
    air_sensor_ready_event = process_alloc_event();
  }
  process_start(&sensor_read_process, NULL);
}
/*---------------------------------------------------------------------------*/

Then, the caller could be something like this,

/*---------------------------------------------------------------------------*/
PROCESS_THREAD(app_process, ev, data)
{
  PROCESS_BEGIN();
  static struct etimer et;

  /* commence sensor boot up etc, and wait until the data is available */
  sensor_start_read();
  PROCESS_WAIT_EVENT_UNTIL(ev == air_sensor_ready_event);

  /* retrieve the sensor data, then send it to the server */
  int latest_air_reading = sensor_read();
  thsq_sset("air", latest_air_reading);
  thsq_push();

  PROCESS_END();
}
/*---------------------------------------------------------------------------*/

Clocks and time

The firmware OS keeps several notions of time, of varying time scale and for different purposes. These are driven by hardware peripherals in the CPU, that the application or application drivers may not affect. However, many applications need to know time, and many drivers need a hardware clock peripheral of its own. Below we will go through this in detail.

OS Time

Generally speaking

The OS offers at least two notions of time, from the clock module and from the rtimer module respectively. Most application code uses the clock module since it offers an enough fine grain resolution and works with the PROCESS constructs for non-blocking waits. Many drivers may need the rtimer module since it has higher resolution.

For both rtimer and clock, the clocks can wrap (go from a very high number to zero) without warning.

Time comparisons, such as "is this timestamp lower than this other timestamp?" shall use the corresponding macros, as described below.

Don't busy-wait using these modules (unless for really short timespans, eg < 2 ms), instead use processes and callbacks. For example,

/* don't do this - long blocking wait */
clock_time_t start = clock_time();
while(clock_time() <= (start + CLOCK_SECOND));

/* do this instead - allows other things to run meanwhile */
static struct etimer et;
etimer_set(&et, CLOCK_SECOND);
PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));

clock module

The clock module implements the most central time and clock functionality of the OS. A second in clock-module time is found with the macro CLOCK_SECOND which on most platforms is defined as 512 ticks per second. This means that using the CLOCK_SECOND macro, you cannot divide by 1000 to get milliseconds. If that is the case, look at rtimer below.

Typical use cases for the clock module includes non-blocking wait in applications and drivers, taking timestamps in context of long timespans.

The accumulated number of ticks and seconds since bootup is retrieved as such,

clock_time_t time_now = clock_time();
int seconds_since_boot = clock_seconds();

As shown above, waiting for a half a second in a process can easily be achieved as such,

static struct etimer et;
etimer_set(&et, CLOCK_SECOND / 2);
PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));

/* do something here, then wait again */
etimer_set(&et, CLOCK_SECOND / 2);
PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));

rtimer module

The rtimer module has a higher resolution, typically 32768 or 65535 ticks per second. As with the clock module, the corresponding macro definition is found through RTIMER_SECOND. The rtimer is clocked by a hardware peripheral timer, and the current value of the rtimer is found as such,

rtimer_clock_t rtime_now = RTIMER_NOW();

Since the rtimer is much faster, it often wraps. Thus, to get comparisons right, use the macro as below.

/* wait a millisecond */
rtimer_clock_t rnow = RTIMER_NOW();
while(RTIMER_CLOCK_LT(RTIMER_NOW(), rnow + RTIMER_SECOND / 1000));

In fact, this is a very common operation in low-level drivers so a very helpful shortcut definition is the following,

#define BUSYWAIT_UNTIL(cond, max_time) \
  do { \
    rtimer_clock_t t0; \
    t0 = RTIMER_NOW(); \
    while(!(cond) && RTIMER_CLOCK_LT(RTIMER_NOW(), t0 + (max_time))); \
  } while(0)

The macro busy-waits until a condition is met, or until wait time exceeds the max_time argument. Use it like so,

/* wait a ms */
BUSYWAIT_UNTIL(0, RTIMER_SECOND / 1000);

/* wait until SPI is done, max 2 ms */
BUSYWAIT_UNTIL(spi_busy() == 0, RTIMER_SECOND / 500);

Hardware peripherals in use

The underlying OS makes use of some hardware peripherals that should not be affected by an application or drivers running on top of the OS. For example, CPU hardware timers are used to implement clocks and other functionality.

GPIOs

On targets with an external flash chip, primarily CC13XX and CC26XX-based targets, the flash chip must not be used for other purposes. Nor may the SPI peripheral or the GPIOs used for the SPI be used at any time. Doing so may cause serious undefined behavior. The same goes for targets with external radio chip, such as the Weptech gateway with an external CC1200.

Interrupts

Interrupts are used throughout the firmware in many ways. Timers, radio, and GPIO are common users. These are the most important things to keep in mind when using interrupts in your application or driver.

  • Minimize time spent in the interrupt service routine (ISR) handling the interrupt. If you don't, this may disturb time-sensitive radio operations and causing hard to debug network issues. If there is a need to perform long-running operations by trigger of an interrupt, do instead perform the most important tasks (eg noting a timestamp) and exit the ISR. You can notify a process through process_poll();. Don't post events from an ISR.

  • Don't globally disable interrupts unless absolutely necessary, such as eg across extremely short timing sensitive operations.

  • Be watchful for deadlocks, changing variables from an ISR and all the common ISR-related pitfalls with embedded programming.

  • Follow the driver example guide when writing a driver. It shows best practices on how an interrupt can be used without affecting the OS too much.

Radio operations

The firmware makes exclusive use of the radio, whether it is a system on chip with embedded radio, or an external radio. Do not use the radio for anything since this is very likely to disturb the network communications, even possibly in a non-recoverable way.

The radio frequencies or channels cannot be changed from the SDK. Any kind of use of, or calls to, the radio may cause serious undefined behavior and lockups.

Network operations

The Thingsquare system is built to allow product creators to easily make connected products. Wireless networks are very hard, and debugging on a network scale is harder. Hence, the system aims to shield you, the product creator, as much as possible from this. The system is designed to handle the network and network load gracefully if used correctly. Thus, for communication use the Thingsquare data primitives and modules only.

Pitfalls with Contiki

Pre-emption and control

The Thingsquare firmware is using the Contiki OS, and thus come with the same benefits and deficiencies as Contiki. Briefly, Contiki is an event-driven operating system that is not pre-emptive. It comes with a set of software modules and libraries, and abstractions. One of these abstractions is Processes. They are very light-weight but simplistic, and do not have eg priorities.

This means that one process in the OS cannot take control from another. For example, while radio communication is very important and a sensor might be considered unimportant (depending on the application of course), the radio cannot take back control of the CPU from the sensor driver, to be able to execute. The sensor driver must give back control voluntarily. It should do this through the use of processes and eg etimer instead of busy-waiting.

Static variables and switch-case

Due to the light-weight nature of processes and their implementation in the OS, there are two common features of the C language that must be handled specifically: switch-case constructs and automatic variables.

switch-case

This rule is simple: you cannot use switch-case in a process or you will get undefined behavior. This is not a big problem though, since the same logic flow can easily be constructed with a set of if(){}-else if(){}-else if(){}-else{}. Note that this limitation does not apply to ordinary functions - there you can use switch-case.

automatic variables

Automatic variables in C are variables that are created on scope entry, and lost on scope exit. Consider,

void
app_test_function(void)
{
  int i;
  printf("%d\n", i); /* <--- access of an uninitialized variable */
  for(i = 0; i < 10; i++) {
    printf("%d\n", i);
  }
}

Here, the variable i is an automatic variable. It doesn't exist before we enter the function, and when we exit the function it ceases to exist. Since we don't initalize it on entry, the first printf() will print whatever random number was on the stack where the variable now is allocated at, and continue with 0, 1, 2, and so on.

With processes, automatic variables comes with a limitation - across any PROCESS_ construct, they will lose their value. Consider,

/*---------------------------------------------------------------------------*/
PROCESS_THREAD(test_process, ev, data)
{
  PROCESS_BEGIN();
  int i; /* <--- should be static */
  static struct etimer et;
  while(1) {
    printf("%d\n", i);
    i++;
    etimer_set(&et, CLOCK_SECOND);
    PROCESS_WAIT_EVENT_UNTIL(etimer_expired(&et));
  }
  PROCESS_END();
}
/*---------------------------------------------------------------------------*/

This will print random values since the value of i will not be retained across PROCESS_WAIT_EVENT_UNTIL(). The solution is to make the variable static: static int i;. This will make the variable retain its value and we get the expected results.

Technical details: the process is actually an ordinary function. Each time we enter a PROCESS_ construct, we exit and later re-enter the process/function. In the first example, had i been static, it would from the second time on always print 10, 0, 1, 2, and so on.