Overview
- source beta: complete but may need additional testing or work
- source: https://bitbucket.org/arrizza-public/arduinoaccuratepulser
- repo:
git clone git@bitbucket.org:arrizza-public/arduinoaccuratepulser.git
Summary
This project is a refinement to Arduino Pulser. It has the following additional features:
- It is more C++, there are classes now and I show the use of a template
- I separated out the unit tests, one unit test per C++ class
- there is a cmake library, showing how to have additional source files
- I simplified the parser. It now uses a simple buffer to hold a command and then runs the command on a carriage return
- I use a much more accurate way to distribute the requested pulses into the array.
Most of the these points are straightforward refactorings of the existing code.
Modified Bresenham's Line Algorithm
The last point however is big. The pulser now uses Bresenham's algorithm (modified) to evenly distribute the requested pulses across the array. Bresenham's algorithm is described here https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm. You'll notice that I used the same basic mechanism i.e. iterating across the array (think of it as the x-axis in graphics terms), keeping track of an error amount and then incrementing the slot (y-axis) if the error exceeds a +/- range.
On a graphic screen Bresenham's causes the correct pixels to be lit depending on the slope of the line being drawn while ensuring only the minimum number of pixels are turned on. At a more abstract level, Bresenham's takes a formula meant for Real numbers (y = mx + b) and converts it to one that works in a discrete domain, the pixels on your monitor. Each pixel represents the minimum distance and error from the true, precise value of the line at that X coordinate.
I've modified it to evenly distribute a number of pulses into a set of discrete slots. The formula is num_pulses_per_slot = total_pulses / num_slots. This works very well with Real numbers, the value you get for num_pulses_per_slot is precisely accurate. However, I can't have a partial pulse, which means I need a way to transform the equation into the discrete domain - the pulse is either in the slot or not. I could simply take the remainder of the total_pulses / num_slots and add that value to the first slot or some other arbitrary slot. However, one of the requirements here is to distribute the pulses as evenly as possible. It's especially important to do this with sparsely populated slots, e.g. 1 or 2 pulses in an array with 1000 slots. This is needed for running a stepper so the motor movement is not "jerky", it is nice and smooth.
The modified Bresenham's algorithm I used does exactly that, spreading the total number of pulses evenly throughout the slots. It first does the division and allocates the integer result across all the slots. It then takes the remainder, if any, and iterates through the slots. It keeps track of an error amount as it goes. In effect the error is the fractional part of the total_pulses / num_slots accumulating as it goes on to each slot. It then adds a pulse to the current slot, if that error amount is over 1. It subtracts that one from the error and continues on iterating through the rest of the slots.
The final result is that all the pulses (total_pulses) are present throughout the slot array and the remaining remainder pulses, the extra 1s, are as "spread apart" as far as possible. When the remainder is greater than 50% of the number of slots in the array, it easier to think of it distributing the gaps, i.e. the extra 0's are as spread apart as far as possible.
To see this in action, look at the unit test ut_slots.cpp, TEST(slots, distribution) (around line 64). There I take an array with 7 slots (a prime number) and show what happens in each case for the distribution of 1, 2, ... 7 and 8 requested pulses. You can easily see that the pulses (the 1's) and gaps (0's) are evenly distributed at all times.
Setup
ArduinoAccuratePulser.ino
The mainline setup is identical to Arduino Pulser, it initializes the parser, pulser and the serial port.
Parser.cpp
The ctor and init() function basically take the incoming buffer and set it to zeros. The mIndex variable keeps track of which character just came in.
After the refactoring, this file has shrunk substantially and most importantly, only those bits of code related to the mainline behaviors are present.
The refactoring for the Parser has decoupled it's behavior from the rest of the system. It is much more independent and clearly shows only what is related to parsing behavior. Another indication is the Parser Class has only 2 public functions + ctor. Those functions are used in the mainline in ArduinoAccuratePulser.ino. This kind of minimal interface is evidence that the refactoring was done correctly.
Pulser.cpp
This is similar to the Pulser init() function in Arduino Pulser:
- initializes GPIO pin 10
- initializes all statistics to 0s
- sets up Timer2 to run every 2ms, and sets up a separate pre-scaler to cause the overall timer interval to be 100ms.
Note: I have tried some tests with the pre-scaler set at oddball and prime numbers (e.g. 37) to ensure it works ok in those cases. It does.
ISR
The ISR is in Pulser.cpp. It hasn't changed much except that the Slots class is used instead of a simple integer array
Slots
The Slots class is in slots.cpp. I used a C++ template here, so I can perform unit tests with various sizes of the Slot array. There is a quirk of C++ that, if you split a template into a .h and a .cpp, you must declare all the uses of them somewhere otherwise the Linker can't resolve them (see the very last line of slots.cpp).
I have used uint_fast32_t instead of "int". This, according to the C++11 standard should be set up for each platform (the Arduino in this case) to be the size of integer that generates the fastest code. I haven't checked this out using avr-objdump see Arduino Fast Square Wave for more but let's say it works as advertised.
Main Loop
The main loop() is similar to Arduino Pulser.
ArduinoAccuratePulser.ino
This looks for an incoming character on the Serial port and hands that off to the Parser.
Parser.cpp
The main routine here is Parser::handle_ch(ch). It does the following:
- skips empty lines (i.e. a cr or lf at the index 0)
- skips blanks
- checks there is room in the buffer and re-initializes the buffer if necessary
- if it is an incoming cr or lf and the buffer has something in it, process it
The processing of the buffer is done by handle_eol(). It does the following:
- figures out what the command is ('s' for Stop!, 'g' for Go! and 'n' for Numbers!)
- for Go! it also gathers the remainder of the buffer looking for a decimal number. This is the number of pulses to distribute. It calls the Pulser::go() function to do that distribution.
- for Stop! it calls the Pulser::stop() function to stop the pulsing
- for Number! it gathers the current statistics and sends them back over the serial port to the caller
Pulser.cpp
The ISR handles the pulsing behaviors. The Pulser::go handles changes to the given number of pulses. The Pulser::stop handles setting the Slots to 0s which effectively stops the pulsing. The remainder of the functions are used to provide the caller with current statistics.
ISR
The ISR is called every 2ms and has a pre-scaler to allow longer time bases. It increments through the Slots array, pulsing the output pin with the given number of pulses for the current slot.
Slots
The Slots class handles the setting up of the internal array is done via the set_to() and distribute_remainder() functions. Iterating over the internal array is done by the next() function.
Notes
auto_test output
If you run auto_test.rb, you can see the expected pulse rate is 39 pulses per second (pps).
At the end of the run, the actual pps is 39.008264. This is roughly a 0.8% error.
---- get stats at 0.100s
rx : nACK 0 2360 1 605 100 5000
stats: req pulses : 19500 full time : 500.000000 s
stats: total slots: 5000 slot time : 100 ms
stats: curr pulses: 0 curr ticks: 1
stats: last pulses: 2360 last ticks: 605
stats: curr time : 0.100000s curr pps : 0.000000
stats: last time : 60.500000s last pps : 39.008264
stats: expected pps: 39.000000
If you run auto_test.rb for longer than 60 seconds, say at least 500 seconds, then the error rate will drop. This is so, because the distribution of the pulses within the 5000 slots is not perfectly even. You can try this out by setting the number of pulses to a value evenly divisible by 5000, e.g. do_go(20000). In this case, the expected pps is 40 and the actual is 40 (to at least 6 decimal digits).
---- get stats at 0.100s
rx : nACK 0 2420 1 605 100 5000
stats: req pulses : 20000 full time : 500.000000 s
stats: total slots: 5000 slot time : 100 ms
stats: curr pulses: 0 curr ticks: 1
stats: last pulses: 2420 last ticks: 605
stats: curr time : 0.100000s curr pps : 0.000000
stats: last time : 60.500000s last pps : 40.000000
stats: expected pps: 40.000000
Potential Errors
The goal is to achieve an accurate pps value, but there are two main sources of errors:
- the discrete nature of the parameters, i.e. you can't have a 0.1 pulse
- the time base of the ISR, currently interrupted every 2ms
For example say you had a stepper that needs 200 steps per revolution, and you wanted to have exactly 1 revolution for each sidereal day. That is one revolution per 86,164.1 seconds https://en.wikipedia.org/wiki/Sidereal_time#Sidereal_time_and_solar_time This means 200 steps in 86,164.1 seconds which is 0.002321 pps (to 6 digits).
The slowest pps the current code can do is 1 pulse in all slots. With 5000 slots and 100ms per slot (which is a 500s width), then the best we can do is 1 pulse in 500s which is 0.002 pps. To achieve an accurate sidereal revolution, you'd need to:
- allocate more slots
- get a longer time base in the ISR pre-scaler (longer than 100ms)
- use a stepper that needs more than 200 steps per revolution
- use gearing
But even if you could do all that and changed the parameters just so to get 0.002321 pps, there are still some additional sources of errors. The primary one is that the ISR time base of 2ms may not be 2.000000ms (to 6 digits). That time is based off of the Arduino clock frequency which is generated via a crystal on the Arduino board. These are quite accurate but can be 20 - 50 ppm off, roughly speaking that's 4 or maybe 5 digits of accuracy. see https://electronics.stackexchange.com/questions/15851/what-is-the-ppm-in-the-crystal-oscillator.
Also, crystals are susceptible to changing because of ambient temperature. For example, to achieve extreme accuracy requires encasing the Arduino in a box that is heated to a stable temperature.
Maximum Number of Slots
The Arduinos Mega2560 has 8K RAM allocated for data. Some of that RAM is used to hold various temporary and other variable data, but the majority is used to hold the Slots array, currently at 5000 bytes.
When you compile there is some additional information about the Firmware size:
Calculating image size
Firmware Size: [Program: 8222 bytes (3.1%)] [Data: 5899 bytes (72.0%)] on atmega2560
EEPROM Size: [Program: 0 bytes (0.0%)] [Data: 0 bytes (0.0%)] on atmega2560
The one that sets the maximum size of the Slots array is the Firmware Data, which currently is using 5899 bytes (72% of max). The Slots array uses 5000 bytes and the rest of this code and Arduino Library code uses the remaining 899 bytes.
Unit Testing Using gtest
There are many unit test frameworks out there. I chose googletest aka gtest.
Pros
It is popular so there is a lot of documentation for it on the net. The output is easy to read and when there is a failure, there is enough information to easily track down the source of the problem and why. There are a lot of calls for testing various C/C++ constructs.
Cons
It heavily uses macros and so it confuses the formatter in Clion. Each test case function is set up like this:
TEST(parser, some_command)
{
}
But the Clion formatter expects to have a return type for "TEST". Formatting the file causes a bunch of errors and the formatting to be incorrect.
To try to fix this, I created "bob":
#define bob
bob TEST(parser, some_command)
{
mock_init();
}
Unfortunately, "bob" didn't work. The CLion formatter was still confused by gtest:
#define bob
bob TEST(parser, some_command
)
{
mock_init();
Another con is that Gtest runs all of it's test cases in the current process. This means that if there is a global variable for example in the code under test, then it will maintain it's state across multiple test cases or test suites. Some frameworks run each test suite in a separate process (e.g. Check forks each test suite).
That means that test suites are somewhat isolated from each other. I say somewhat because there can still be files and other artifacts that could affect the execution of other test suites. Also test cases are not isolated from each other as well. It is up to the developer to clearly and accurately set up the initial environment for each test case.
I did this via the "mock_init()" function. That function should clear all the important parts of the global state that the test case runs in.
Using lcov
Make sure you installed lcov see Arduino Setup. This is a front end to the gcov tool which gives you coverage information of the gtest unit tests.
To run it, use make
# make sure you are in the unit test directory
cd ut
make
make lcov
A web page should show up in your default browser (mine is chrome).
Click on the "ArduinoAccuratePulser" hyperlink (in the "Directory" column)
From this page you can see that information was gathered for three files and that all of them have near 100% coverage.
This version of parser.cpp had 97.3% coverage with 73 out of 75 lines. Note parser.cpp may have changed since I took these screenshots and so the numbers may be different. Also check out the Branches column which indicated that 95.2% (20 out of 21) branches were tested. In other words I missed one path through the code.
A line is covered when at least one test case executes that statement. You can see the actual number of times that a statement was run by looking in the "Line data" column. For example the function "print_long()" was called 12 times.
Some statements can cause the flow of control to go one way or another. For example an if statement can either be true or false and so there are two branches. A switch can have 1 branch per case statement plus one more for the default case.
To find out which branch is not tested in this code, click on parser.cpp in the "Filename" column. Scroll down until you see some lines highlighted in red. Around line 14 you'll see the "switch( mState)" line shows that one of the case statements was not taken. And if you scroll a little farther, around line 105, there is a default case in the switch that doesn't have a test case.
This default in the switch is only taken if there is an unrecognized parser state. This can't happen in normal operations. So there are two, maybe three, choices, either delete the code or put in a "test hook" that would allow me to simulate that condition.
I added the default switch in case, sometime in the future, I add a new state in the parser's state machine and I forget to add a case to handle it in this switch. If I don't have this message here, the parser would fail to parse correctly, but it would fail silently. Having the default case makes it fail nice and "loud". Therefore, I would prefer not to delete the default.
I could a test hook, say add a method in the class to force the parser's state variable to some arbitrary value. But the intent of the code is to be used only during development, not in the final version of the code. Doing that extra work doesn't give me much benefit. Therefore, I prefer not to add any more test code.
And that leaves me with my choice - do nothing. I will have to ignore the fact that lcov reports "97.3%" instead of a "100%". For now, I'm ok with having to double-check why 100% isn't covered. But I will have to deal with that every time I make changes to the code or to the unit tests. If that effort becomes higher than adding the "test hook", I'll add the test hook and cause the report to go to 100%.
Makefile for lcov
To run the lcov utility, look around line 36 in ut/Makefile
run: $(BDIR)/ut
rm -f $(BDIR)/*.gcda
$(BDIR)/ut
LCOV_DIR := $(BDIR)/coverage.info
lcov:
lcov --capture --directory .. --rc lcov_branch_coverage=1 --output-file $(LCOV_DIR)
lcov --remove $(LCOV_DIR) "/usr*" "ut_*" "mock*" --rc lcov_branch_coverage=1 -o $(LCOV_DIR)
genhtml --branch-coverage --highlight --legend --num-spaces 4 $(LCOV_DIR) --output-directory $(BDIR)
xdg-open $(BDIR)/index.html
Note in the run target, that there is line to remove "*.gcda" files. These are the gcov/lcov data files. If these are not removed, then the line and branch execution counts are cumulative, i.e. they get bumped up every time you run the unit tests. But since there are removed, the line and branch statistics will show the number of times a statement was executed in the last unit test run.
The first line in the lcov target tells lcov to capture the data for the files named in the ".." directory. The switch lcov_branch_coverage tells lcov to gather branch information. All of this is then put into the build/coverage.info file.
The next line tells lcov to remove any information from files invoked from /usr/ (i.e. the gcc system libraries), from "ut_" (i.e. the unit test files) and "mock* (i.e. the arduino mock files). It can be interesting to see the coverage of these files, so temporarily comment out the line (put a "#lcov..."). For example, you can see that code from iostream and bits in the C++ standard library were used. Parts of gtest were used, and that most of mock_arduino.cpp was used (some of the code in there is just to ensure compiler and linker happiness, it is not run).
The next line generates the HTML files via the genhtml command. The index.html file is in the build directory
The last line uses xdg-open to create a browser window with the index.html file.