Running Average

Overview

PyPi module N/A
git repository https://bitbucket.org/arrizza-public/running-average
git command git clone git@bitbucket.org:arrizza-public/running-average.git
Verification Report https://arrizza.com/web-ver/running-average-report.html
Version Info
  • macOS 14.5, Python 3.10
  • Ubuntu 20.04 focal, Python 3.10
  • Ubuntu 22.04 jammy, Python 3.10

Summary

It is sometimes necessary to keep track of an average for a list of numbers that are input only one at a time.

Scripts

Using the full list

The average of a set of numbers is the sum of the numbers divided by the number of items.

$$Avg = \sum_{}^{N} {x_i}$$

In code this looks like:

#include <iostream>

#define N 5
int list[N];

void use_full_list(int* list)
  {
  int      sum = 0;
  for (int i   = 0; i < N; ++i)
    {
    sum += list[i];
    }

  double average = (double) sum / N;

  printf("%-15s: sum: %3d N: %3d average: %.2f\n", "full list", sum, N, average);
  }

int main(void)
  {
  list[0] = 1;
  list[1] = 2;
  list[2] = 4;
  list[3] = 3;
  list[4] = 2;

  use_full_list(list);

  return 0;
  }

which generates this output:

full list      : sum:  12 N:   5 average: 2.40

Running Average

But this requires the full set of numbers to calculate. What if we didn't want to store the full set of numbers?

void one_at_a_time(int val, int& num_elems, int& sum, double& average)
  {
  num_elems += 1;
  sum += val;
  average = (double) sum / num_elems;
  printf("%-15s: sum: %3d N: %3d average: %.2f\n", "one at a time", sum, num_elems, average);
  }

  // and in main()
  int      num_elems = 0;
  int      sum       = 0;
  double   average   = 0.0;
  for (int i         = 0; i < N; ++i)
    {
    one_at_a_time(list[i], num_elems, sum, average);
    }

gives output:

one at a time  : sum:   1 N:   1 average: 1.00
one at a time  : sum:   3 N:   2 average: 1.50
one at a time  : sum:   7 N:   3 average: 2.33
one at a time  : sum:  10 N:   4 average: 2.50
one at a time  : sum:  12 N:   5 average: 2.40

In other words, we get the same final result, but we can calculate the average on the fly.

$$Avg_{n+1} = \frac{ (Sum_n + X_{n+1})} { n + 1}$$

Storing the Average

But notice the sum in the output above, it keeps going up 1, 3, 7 etc. It may be possible that we overflow the variable sum.

void only_average(int val, int& num_elems, double& average)
  {
  double sum = average * num_elems;
  sum += val;
  num_elems += 1;
  average    = sum / num_elems;
  printf("%-15s: sum: %5.2f N: %3d average: %5.2f\n", "only average", sum, num_elems, average);
  }

// and in main()
  int      num_elems2 = 0;
  double   average2   = 0.0;
  for (int i          = 0; i < N; ++i)
    {
    only_average(list[i], num_elems2, average2);
    }

The output is:

only average   : sum:  1.00 N:   1 average:  1.00
only average   : sum:  3.00 N:   2 average:  1.50
only average   : sum:  7.00 N:   3 average:  2.33
only average   : sum: 10.00 N:   4 average:  2.50
only average   : sum: 12.00 N:   5 average:  2.40

Again, we get the same final average but this time we only store a couple of variables.

$$Avg_{n+1} = \frac{ (n * Avg_n) + X_{n+1} } { n + 1 }$$

Caveats

Note that the line

 double sum = average * num_elems;

can still overflow.

And also note that this may not be exactly equal to

$$average * num$$

since it is finite arithmetic.

- John Arrizza