The Mesh

Thingsquare devices automatically form a so-called wireless mesh network between them. This mesh network is used by each device to send and receive data. Devices also use the mesh to connect to the backend server stack for authentication and data delivery.

Why a mesh?

The purpose of the mesh is to let the wireless devices reach the user's smartphone app, even if the closest Internet connection point is so far away that it cannot be reached with a single radio transmission.

The mesh routes around tough spots

Wireless communication can be flaky, as anyone who has ever used WiFi or a 3G phone knows. There are places where the wireless signal is weak because something is in the way, or because of radio interference.

The mesh automatically finds its way around problematic areas, by sending messages through paths that are not affected.

The mesh extends the wireless range

Low-power radios have a limited range. While the range may be hundreds of meters, it still has an upper limit.

The mesh extends the range by allowing devices to help each other by relaying each other's messages.

The mesh forms automatically

The wireless mesh is automatically created by the devices themselves. This means that most users won't even know it exists: it just works.

The mesh self-heals

The wireless communication may suddenly break, perhaps because something started interfering with the signals, or because nodes were moved around. The mesh identifies such problems and automatically re-routes the network: it is self-healing.

The protocols that make the mesh work

The wireless mesh uses the Internet Protocol (version 6). To form the mesh, the devices use a protocol called RPL – pronounced "ripple".

The RPL protocol sends and receives a number of different messages when as it forms the network. Those messages are called DIO (Destination-oriented directed acyclic graph Information Object) and DAO (Destination Advertisement Object).

Wireless mesh formation

All devices have a wireless radio so that they can talk to each other. One or more devices have an Internet connection, via WiFi or Ethernet or 3G.

One Internet-connected devices becomes responsible for initiating the wireless mesh formation. This node is called the root node.

Creating the mesh

The root node first sends out a DIO message that lets the other nodes know that this node is the root node.

The other devices in the mesh hears the DIO from the root and joins the newly created mesh. When a node has joined the mesh, it too begins sending out DIO messages to the other nodes. All messages are encrypted so only those devices that should belong to the same network will hear them.

After sending a few DIO messages, all nodes in the mesh will know how to reach the root node: by going via the neighbor closest to the root.

DIO messages contain important information about the mesh network, such as the address of the root node, the version number used by the network, and the routing metric used by the mesh.

Setting up the routes

Devices that hear the DIO messages from the root attach to a RPL network and begin sending their own DIO messages. The DIO messages contain a routing metric that holds information about how far away from the root a device is. Devices closer to the root have lower metrics. Devices use the routing metric when determining how to send packets in the network: routes with lower routing metrics are preferred.

After a few minutes, all devices have exchanged DIOs and the network has stabilized. The devices will keep sending DIO messages, but increasingly seldom. To avoid overloading the network, the devices will also refrain from sending DIOs if they have already heard DIOs from others, within their send interval.

Message routing in the mesh

The devices in the mesh have multiple ways to reach the root of the network. To choose which way to take, the devices continuously measure the quality of the paths and use the one that needs fewer transmissions to reach the root. If a path goes bad, more retransmissions are needed, and the device will switch to a better path.

To setup routes in the downward direction, from the root to the devices in the network, the devices exchange DAO messages. The root device knows the route to all other devices, and the other devices know the route to all devices below them in the routing graph. Packets going between devices in the network are routed through common ancestors.

Connecting to the backend server stack

The purpose of the mesh is to have a stable and secure way of communicating with the users' smartphones, via the backend server stack.

Communication with the server stack is done via the Internet, which means that the communication must be protected by strong encryption. The Thingsquare system uses 2048-bit TLS encryption, the same strength used by bank transfers and other secure Internet communications.

The Internet uses the Internet Protocol version 4, IPv4, whereas the mesh uses the Internet Protocol version 6, IPv6. The root node, which also has the Internet connection, translates between the two versions of the Internet protocol.

All devices may connect to the backend server stack through the mesh. To know what address to connect to, the devices perform a Domain Name System (DNS) query for the current server stack. The DNS query is sent, via the mesh, to the DNS server on the Internet.

The DNS server responds with the IPv4 address of the server that the DNS name points to. The root node intercepts the DNS response and rewrites the IPv4 address in the response to an IPv6 address with a special prefix that corresponds to the IPv4 address of the Internet server.

After performing the DNS query, devices sets up a secure TLS connection with the backend server by using the IPv6 address in the DNS response. The connection is encrypted with strong encryption to avoid eavesdropping.

Sleepy meshing

Radio communication drains batteries. And not just transmissions: idle listening for transmissions from others consumes even more energy. To save battery, devices must completely shut off their radios, but they cannot have them turned off too much. They must have them turned on often enough to be able to receive and relay messages from others.

Devices in a Thingsquare mesh sleep for most of the time, but quickly wake up between 2 and 16 times per second to check if there is any activity over the radio. If a signal is detected, they keep the radio on for a longer while to see if there is a message being transmitted. When the message is received, the receiver sends an acknowledgment message. To send a message, the sender repeatedly sends its message until it hears the acknowledgment. This allows devices to keep their radio off more than 99% of the time, thereby increasing lifetime from days to years.

Self-healing through re-routing

The mesh is designed to withstand failure. This is because wireless communication by its nature is unreliable: the wireless signals may not always make it as they should.

The mesh may break in several ways. The first is if two devices that previously were able to communicate with each other no longer are able to. This may be because they moved, or because something blocket their wireless signals. The second is if the mesh root goes away. This can happen if the wireless communication near the root goes down, if the root node would lose its power, or if the root node would physically break.

If one link in the routing graph breaks, the network will have to re-route around the problem. Detection of a broken link is done when the two nodes that share the link communicate with each other. Communication may be triggered by data being sent to or from a device, or because of periodic link probing being done by the devices.

If a device detects a broken link, it acts differently if the link is an upward or downward link. If an upward link is found to be broken, the device will simply pick a new upward route. Each device maintains a list of potential upward routes and will pick the one with the best quality score. Picking a new parent typically means that the device will have a new RPL rank, so the device will begin to rapidly distribute information about its new rank to the network.

If a downward route is found to be broken, a more complex procedure takes place. Because each device only maintains a single routing table entry for each downward route, the device cannot by itself pick a new next-hop device for this route, but must defer this decision to the root node. This is done by sending a DAO NOACK message towards the root. Because a broken downward route also means that at least one upward link is broken, the root initiates a global repair of the routing tree. This is done by increasing the RPL version number of the network. The network will then quickly rebuild itself and the broken link is no longer a problem.

Fallback root nodes

If the root node is gone, the network is not able to repair itself or to establish new routes. The nodes within the network will be able to function independently for a while, but if a re-routing is needed, the network may stop working. It is therefore important to establish a new root node as quickly as possible.

The mesh allows multiple dormant root nodes to be part of the network. As long as one root node works correctly, the dormant root nodes exist as ordinary mesh nodes that participate in the routing of packets like any other node. But if a dormant root node detects that the active root node has broken down, it takes over control of the network. A network may have several dormant root nodes and the speed by which a repair is triggered is configurable.

Conclusion

The mesh is an efficient way to create a resilient and secure wireless network. The Thingsquare system automatically forms a wireless mesh between the devices, but most users won't even know it is there. It just works.