Python: Effective Serial Data Processing

By Sebastian Günther

—

1st November, 2021

—

Posted in Microcontrollers, Python, Raspberry_pi_sbc, Raspberry_pico

Sending serial data between microcontrollers and single-board-computers is an easy way to exchange information. In the last article, I showed three options how to connect the Raspberry Pi to the Raspberry Pico. At the time of writing this article, the best way is to use an USB-FTL connector on the Pi, and then to connect directly to TX/RX. This way, you do not need to configure anything special on your Pi and have a reliable connection.

Setup is one thing. But how do you use a serial connection effectively? What are the best practices? In this article, I describe how to work with bitfields, text, Python objects, and interchangeable data formats like JSON. Also, a comprehensive performance tests between all of those methods is done to find out which method is best for timing critical applications.

Data Formats for Serialization

The choice for a suitable data structure to send between multiple python programs is simple: Any data transmitted is a string. And what this string represents, that is up to you. You can transmit just numbers, e.g. a bitmask, that is mapped to data or state on the receiver side. You can invent your own mini language to represent text command. or you work with concrete python mutable or immutable objects, parse them to their string representation and transmit them. And finally, you can use a well-defined, interchangeable data format like JSON or YAML.

These are plenty of options. Which should you use and why? What algorithmic costs are occurring for serializing and de-serializing the data? The following sections will briefly explain each of these formats and give a small example: sending movement commands to a robot, like moving forward or backward, with a certain speed, and turning left or right. Finally, I will make a performance measurement of each methods and the discuss the results.

Texts

When you use plain text, the format is up to you. You need to define your own language to represent the data that you are transmitting.

In the context of our robot example, lets define these statements to represent movements of a robot:

MOVE_FORWARD=1
MOVE_BACKWARD=2
TURN_LEFT=45
TURN_RIGHT=90
STOP

The speed values are absolute values from 1 to 10. The turn values represent radians.

Commands in this form will be transmitted on at a time. The receiver needs to parse the text, extract the command and values, and instruct the robot.

Bit field

Bit fields encode data as binary, so all you need to do is to encode your commands in a suitable binary representation. The sender encodes, the receiver decodes the data.

Continuing with our example, we need to create a bitmask with these steps:

Determine command encoding: How is each command represented in binary?
Determine value encoding: How are the values represented in binary?
Determine field length: What is the highest integer that will be transmitted?

Let's answer these questions step-by-step.

# Command Encoding
MOVE_FORWARD  = 1
MOVE_BACKWARD = 2
TURN_LEFT     = 3
TURN_RIGHT    = 4
STOP          = 5

# Command Encoding Field Length
BIGEST_INTEGER_VALUE           = 5
BIGEST_INTEGER_VALUE_IN_BINARY = 101
BITFIELD_LENGTH                = 3

# Value Encoding
MOVEMENTS     = Integer
TURNS         = Integer

# Value Encoding Field Length
BIGEST_INTEGER_VALUE           = 360
BIGEST_INTEGER_VALUE_IN_BINARY = 101101000
BITFIELD_LENGTH                 = 9

Therefore, our bit field format consists of a 3bit command and 9bit value. Example commands:

# Command Encoding
MOVE_FORWARD_SPEED_7  = 0b001000000111
TURN_RIGHT_242        = 0b100011110010

Python data structures

In Python, any built-in immutable data type (integer, floats, tuple) or mutable data types (list, dictionaries, sets) can be converted to a string with the repr function. See the following examples:

>>> f = 3.12345245
>>> repr(f)
'3.12345245'

>>> lst = ["hello", "from", "Pi", 4]
>>> repr(lst)
"['hello', 'from', 'Pi', 4]"

>>> components = set()
>>> components.add("Pi4")
>>> components.add("Pico")
>>> components.add("D435")
>>> repr(components)
"{'Pico', 'Pi4', 'D435'}"

It is also possible to define the __repr__ function on classes and instances to make them represented as a string, however this is only meaningful if the same class definition is available on sender and receiver side. For exchanging small messages, this is an unnecessary overhead.

Considering our robot example, a suitable data structure to submit commands would be a list where key-value pairs follow each other, of even a dictionary with fixed key-value pairs.

list = ['MOVE_FORWARD', '7', 'TURN_RIGHT', '242']
repr(list)
"['MOVE_FORWARD', '7', 'TURN_RIGHT', '242']"

dict = {'MOVE_FORWARD': 7, 'TURN_RIGHT': 242}
repr(dict)
"{'MOVE_FORWARD': 7, 'TURN_RIGHT': 242}"

Using these data structures gives additional programming benefits: Modifications, like renaming commands or changing the value typed, and future additions are simpler to implement. Also, messages can include any number of commands and metadata. Dictionaries are especially powerful in this regard, as they can be traversed, manipulated, and traversed with iterators. This makes them versatile in serial communications where you work with string data anyway.

Interchangeable Data Formats

The final option is to choose an interchangeable data format. A very common format is JSON, a short hand for Java Script object Notation. JSON can be used to serialize literals (integers, strings, boolean) and structures (lists). YAML is a superset of JSON and intended to be human readable.

Following our example, a simple YAML data structure to transfer robot commands is this:

And in yaml:

MOVE_FORWARD: 7
TURN_RIGHT: 242

JSON and YAML offer the same benefits as Python data structures: modifications and extensions are simpler. In addition, these data structure can be read by other programming languages as well. In the context of our example however, we will stick t Python for serialization and deserialization.

Performance Comparisons

In order to compare the performance of these different approaches, we can use Pythons built-in timeit function. As documented, this function receives providing a string of statements, the number of times the statement should be executed, and the number of repetitions. Simple example:

import timeit

context = """
def plus(a,b):
  return a+b
"""

results = timeit.repeat(stmt='plus(6,36)', setup=context, repeat=5, number=1000000)
print("Average Time:", sum(results)/5, "\nMeasurements:", results, )

Calling this function yields the following output. As we see, the average time is 0.25 seconds.

AVG Time: 0.2519046256085858
Measurements: [0.25533306901343167, 0.25123326701577753, 0.2510213520145044, 0.2509820369887166, 0.25095340301049873]

With this tool, we can measure encoding and decoding of text, bit fields, and python objects.

Full Example: Measuring Text Decoding Performance

Let's make a concrete example for one specific decoding.

Encoding a text is simple: Its a string that needs to be converted to a byte array.

def encode(text):
  return text.encode('utf-8')

Decoding involves several steps. First, we decode the byte array back to a sting. Second, we execute a regular expression on the string. Third, we convert the matches into a tuple.

from re import match

def decode_text(text):
  decoded_text = text.decode('utf-8')

  reg_exp = r'(\w+)=(\d+)'
  matches = match(reg_exp, decoded_text)
  result = matches.groups()

  return(result)

To measure this code, we apply the following steps:

Define the statement to be executed: t = encode("MOVE_FORWARD=1"); decode_text(t)
Define the complete encoding and decoding code as a multiline string called context
Call the timeit function with the statement, the context, the number of executions, and the repeat cycles

text_decoding_measurements = timeit.repeat(
    statement='t = encode("MOVE_FORWARD=1"); decode_text(t)',
    setup=text_context,
    repeat=20,
    number=10_000)

In the same manor, we can perform tests for all the cases.

Performance Measurements

Decoding Bit Fields

Source Code

def encode(bitfield):
  return bitfield.encode('utf-8')

def decode_bitfield(msg):
  bitfield = msg.decode('utf-8')

  cmd_bit = '0b' + bitfield[2:5]
  value_bit = '0b' + bitfield[5:14]

  mapping = {1: 'MOVE_FORWARD', 2: 'MOVE_BACKWARD', 3: 'TURN_LEFT', 4: 'TURN_RIGHT', 5: 'STOP'}

  result = (mapping.get(int(cmd_bit,2)), int(value_bit,2))
  return result

Performance Measurements

Bitfield Decoding of 'encode("0b00100000011")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Average Time: 0.05363158800173551
Measurements:  [0.027031786972656846, 0.02717018499970436, 0.026509653020184487, 0.026926272024866194, 0.0266216280288063, 0.027101150946691632, 0.026846155989915133, 0.02704080700641498, 0.026566963992081583, 0.02656059304717928, 0.026734582032077014, 0.026688970043323934, 0.02639739098958671, 0.0269411489716731, 0.026987641002051532, 0.026755536964628845, 0.02668587298830971, 0.026785144000314176, 0.026891918969340622, 0.02707247802754864]

Decoding Texts

Source

from re import match

def encode(text):
  return text.encode('utf-8')

def decode_text(text):
  decoded_text = text.decode('utf-8')

  reg_exp = r'(\w+)=(\d+)'
  matches = match(reg_exp, decoded_text)

  result = matches.groups()
  return(result)

Performance Measurements

Text Decoding of 'encode("MOVE_FORWARD=1")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Average Time: 0.08896405659033917
Measurements:  [0.04545394500019029, 0.044554901018273085, 0.04442428599577397, 0.0443288889946416, 0.04432642401661724, 0.043912817956879735, 0.04431018000468612, 0.04420615697745234, 0.044510716979857534, 0.044194838963449, 0.04418196598999202, 0.0441048729699105, 0.044894682010635734, 0.044447703985497355, 0.0445377750438638, 0.044568741985131055, 0.0446226799977012, 0.045044841011986136, 0.04473581199999899, 0.044278335000853986]

Decoding Python Objects

Source

def encode(text):
  return text.encode('utf-8')

def decode_python_objects(text):
  decoded_text = text.decode('utf-8')

  result = eval(decoded_text)
  return(result)

Performance Measurements

Python Objects Decoding of 'encode("('MOVE_FORWARD’, 1)")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Average Time: 0.5119784157956019
Measurements:  [0.25575865199789405, 0.2549030850059353, 0.2579314360045828, 0.25870296999346465, 0.2604327359586023, 0.2600087499595247, 0.2584023640374653, 0.2558257740456611, 0.25510250200750306, 0.2583176919724792, 0.2549058750155382, 0.255189977993723, 0.25448778801364824, 0.2551257850136608, 0.2546310239704326, 0.25484405900351703, 0.25408411998068914, 0.25427312596002594, 0.2534371940419078, 0.2534192479797639]

Decoding YAML Objects

Source

from yaml import safe_load

def encode(yml):
  return yml.encode('utf-8')

def decode_yml(msg):
  yml = safe_load(msg)

  result = tuple(yml)
  return result

Performance Measurements

YAML Decoding of 'encode("['MOVE_FORWARD’, 7")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Average Time: 9.058417272800579
Measurements:  [4.193098113988526, 4.122215965995565, 4.285432767006569, 4.330240387003869, 4.076146796985995, 4.159796627995092, 4.560074345965404, 4.498862796986941, 4.709009921003599, 4.322364090010524, 4.69364814698929, 4.772696384985466, 4.7846884350292385, 4.738021294993814, 4.648917622980662, 4.780966740043368, 4.605022739968263, 4.846610993030481, 4.693909451016225, 4.7624491060269065]

Comparison

In short, here is the runtime for each type of decoding:

	Total Time	Relative Performance
Bitfield	0.053631588001736	100%
Text	0.088964056590339	166%
python Objects	0.511978415795602	955%
YAML	9.05841727280058	16890%

Passing and interpreting bits is the most performance effective method, but you need to write more code and the commands are less extensively without refactoring. Working with straight texts is 60% slower and you should be firm with regular expressions, but this approach is versatile, interchangeable and makes your language design extensible. The third method, sending stringified Python objects and evaluating them, has a major impact to performance. Although implement is the easiest one, and extensibility is high, you should not use this approach in applications that operate on microseconds or which provide huge amounts of data. The final method, well, don't use it.

Conclusion

When working with serial data, you can use different methods for designing the data format, the language that you want to transmit over the wire. These methods are: a) encode everything as bit fields, b) send texts that represents command, c) serialize and evaluate complete Python objects, and d) work with a data exchange format like YAML or JSON. But what is the performance of these methods? To uncover this, the article showed how the built-in timeit function can be used for simple and effective measurements. Comparing all methods shows two winners: Bitfields are by far most performant, but require more coding and are less extensible, followed by text that is easy to program with and extensible.

Previous: Serial Connection between Raspberry Pi and Raspberry Pico

Next: Building a Custom PC in 2021