Saturday, October 16, 2010

X10 library performance optimization

When creating the X10 libraries I spent a lot of time on optimizing code performance and figuring out how to make all the code "non-blocking". The solution is a combination of using external interrupts, pin change interrupts and a timer overflow interrupt.

The ATmega168/328 chip has three timers, one of which is used to update the counters for the millis() and micros() functions. You can find a great article about the timers here:
http://www.arduino.cc/playground/Code/Timer1/
And one about pin change interrupts here:
http://www.arduino.cc/playground/Main/PcInt/.

What is blocking code, why avoid it?

The Arduino functions delay(), delayMicroseconds(), pulseIn() and pulseOut() all rely on some sort of loop that blocks other code from executing while their running. There is really no other simple way of implementing these functions. In general, any loop waiting for something to happen, f. ex. a pin going high, will block for as long as it's running. Another example of code that might block for some time is heavy calculations, like floating point math.

An interrupt can put any piece of code running in the main loop on hold, but an interrupt cannot disrupt another interrupt. Only one interrupt can execute at any given time. When the CPU finishes executing one interrupt it will trigger the next interrupt if the flag for this interrupt was set. If the code execution time of one interrupts is longer than the trigger interval of another interrupt, then the latter will only fire once when you might expect it to fire several times.

In code triggered by interrupts you should always avoid using blocking code. When using blocking code in an interrupt you not only stop execution of the main loop, but you might also prevent other interrupts from firing reliably. Even the timer overflow interrupt that updates the millis() and micros() counters is affected by this. If you rely on more than one interrupt to trigger consistently, like I do when sending or receiving power line messages at the same time as receiving RF and IR commands, you need to make sure that your interrupt triggered code runs as fast as possible. If you don't: expect execution/timing to become unreliable.

How to verify that code executes the way you planned:

A crude but quite effective way is using Serial.print(). You can even measure the performance of a piece of code using a combination of micros() and print(). There's an obvious problem though: using print() and other functions affect the performance of the code. The only way to really see what's happening on the inputs and outputs of the Arduino, is to use an oscilloscope.

I just borrowed an oscilloscope from a friend of mine and I'll show you the differences in performance between version 1.0, 1.1 and 1.2 of the X10ex library. One of the reasons the performance of the X10ex library is critical is described in the PLC interface manual. X10 messages are sent, one bit at the time, by applying a voltage to an input on the PLC interface in synchronization with a power line zero cross detection output. Basically: when the zero cross detect output goes high, you have about 50 microseconds to figure out whether to leave the input low or set it high by enabling an output pin on the Arduino. Another reason is to make sure it doesn't mess up the timings of the RF and IR libraries, since it uses interrupts to do most of the work.

The following images are screen shots of oscilloscope output. The first set of images show the delay from the zero cross detect pin on the PLC interface goes high until the output pin on the Arduino is set high by the X10ex library when transmitting.

v1.0. Zero Cross detect in red, Output in blue.
70us delay after Zero Cross detect reaches 2V.
v1.1. Zero Cross detect in red, Output in blue.
50 us delay after Zero Cross detect reaches 2V.
v1.2. Zero Cross detect in red, Output in blue.
10 us delay after Zero Cross detect reaches 2V.

The following are the same measurements done when flooding the Arduino with serial, RF and IR commands.

v1.0. Zero Cross detect in red, Output in blue.
Under stress the delay is up to 85 us after Zero Cross.
v1.1. Zero Cross detect in red, Output in blue.
Under stress the delay is up to 60 us after Zero Cross.
v1.2. Zero Cross detect in red, Output in blue.
Under stress the delay is no more than 25 us after Zero Cross.

The two last images are measurements done with the latest version of theX10ex library,showing signal length and three phase coupling at 50Hz.

v1.2. Output in blue. This image verifies that the output timer is
working correctly, disabling the output after exactly 1 millisecond.
v1.2. Output in blue. This image verifies that the output timer is working
correctly, repeating the 1ms output to align with zero cross of all phases.

No comments :

Post a Comment