### 1. Averaging Theory

Averaging multiple samples to arrive at single measurement (and error) is a good way to improve the accuracy of your measurements. The premise of averaging is that noise and measurement errors are random, and therefore, by the Central Limit Theorem, the error will have a normal (Gaussian) distribution. By averaging multiple points, you arrive at a Gaussian distribution. You can then calculate a mean that is statistically close to the actual value. Furthermore, the standard deviation that you derive from the measurements gives the width of the normal distribution around the mean, which describes the probability density for the location of the actual value.The standard deviation is proportional to 1/square root(

*N*), where

*N*is the number of samples in the average. Therefore, the more points that are taken in the average, the smaller the standard deviation from the average. In other words, the more points averaged, the more accurately you know the actual value.

### 2. Practical Considerations

Because the standard deviation decreases when the sample number increases, you should take as many samples as possible in the average. However, at some point, you will be limited by the hardware's sampling rate, the memory buffer size, processing speed, or something similar. Knowing the maximum sampling speed of the operation is important. Most of the time, the maximum sampling speed is the same as the limits imposed by the DAQ hardware. However, processing speed or buffer sizes may impose stricter limits. When this happens, those limits usually need to be determined experimentally.

Furthermore, you should take all samples from a range in time where the real signal is invariant. That is, the only variation in the signal should be due to noise. Of course, most real world signals, such as a sine wave, are continuously varying. So, you should take all samples from a range in time where the real signal varies little in amplitude compared to the error that you are trying to eliminate.

For instance, if you are measuring a real signal that has the form of a sine wave with amplitude 1 V and frequency of 1 kHz, the signal changes most rapidly (that is, it has the highest magnitude derivative) at p/2 and -p/2. At these points, the wave changes at a rate of 2000p V/s. If you average over a period of 10 nanoseconds, the real signal changes approximately 63 microvolts. If your noise is greater than this value, you have gained a more accurate result. If, however, the noise is less, you will have succeeded only in obliterating real information.

Of course, for many signals, the exact mathematical form is not known (or even interesting). However, you often still have a good idea of the maximum rate of change of the signal. Note that this is equivalent to knowing the maximum frequency component of a signal.

Knowing the length of time over which useful averaging can take place and the maximum sampling speed of the operation, you can determine the maximum number of samples that can go into each averaged data point.

### 3. When Not to Average

Because high frequency signals (relative to the maximum sampling rate) have greater rates of change, they are often not useful in averaging when you want to retrieve information about that frequency component. Averaging will either damage or destroy the frequency information. (For instance, the average of any integral number of sine waves is zero.)

Also, bear in mind the initial premise of random noise. If it turns out that the source of error in the measurement is not random -- for instance, it is always off in the positive direction -- then the measurement error is not Gaussian, and direct averaging will not produce accurate results. You should take care in removing such "systemic" measurement errors in order to avoid this situation.

Finally, because averaging requires taking many samples, it takes more time than simply reading one data point. Furthermore, the additional processing necessary also takes time. This slower execution may be detrimental to applications that require tight execution loops, such as real-time process control applications.