exponential moving average step response fpga

Slev1n · ‎09-30-2015

Hey guys,

I have a problem with my filter, the exponential weighted moving average filter (IIR 1.st order). From the book: Understanding digital signal processing (Lyons Richard) I have the following formula calculating the 3dB frequency (fc) from alpha. Alpha is the parameter to control the filter.

Differential equation of filter: y[n]=x[n]*alpha + (1 - alpha)*y[n-1]

Relation between fc and alpha: alpha = cos(2fc/fs) - 1 + sqrt[cos²(2fc/fs) - 4*cos(2fc/fs) + 3]

If I now choose a 3dB frequency of 0,0794Hz (time constant (TC) = 2s) alpha = 0,00169621.(fs=94Hz)

For an IIR Filter 1st order, the rise time (ta) of the stepresponse (from 10% to 90%) is: ta=2,2*TC which results in ta =4,4s.

But if I simulate the step response, my rise time is about 3 times of this value at 14s.

I cant explain why the step response of my filter differs so much. For my Moving Average filter the calculated and simulated rise time are equal.

I have the vi which is performed on the FPGA attached. Maybe someone can find a mistake...

greetz

Slev1n

ZX81 · ‎09-30-2015

(see also 'alpha filter' or 'RC filter')

Is your sampling frequency (fs) correct? If the loop timing doesn't match, that would explain it.

Your data types look good (to get alpha within 1%). But I would suggest a minor change in the implementation. As it stands, it is a little bit prone to round-off drifting, because (1-alpha) is repeatedly multiplied by the y[n-1]. A slightly more reliable method is to say y[n] = y[n-1] + (alpha * (x[n] - y[n-1]) ). The difference is subtle, but gives me better results many times. And it eliminates one multiply.

By the way, 'reinterpret number' does the same thing as your convert from FXP to bool then back. It's a little less confusing, though.

I'm a little puzzled by the timed loop that never loops... Does it inforce timing that way? (I assumed it wouldn't, so never used it; I use the Loop Timer instead.)

___________________
CLD, CPI; User since rev 8.6.

Slev1n · ‎10-01-2015

Hey,

thanks for your answer.

1, I proof my sampling frequency with the loop timer. My input are 425.532 ticks which is equal to ~94 Hz. This tickrate is confirmed by "ticks EWMA".

--> Maybe someone can test the code and tell me?

2, I found your approach in the "tricks and tipps" section of Lyons book. I will have a try, but could you explain the round off drifting a little bit? I am quite new in this area.

Is there a further benefit from eliminating one multiplier except ressources? Are the frequency response, impulse response and step response the same?

3, If I only bitshift, I am kind a used to this method 🙂 Not sure if the "reinterprate" function uses less ressources. But thanks for noting it.

4, The timed loop iterates every 425.532 ticks one time. So with a frequency of 94Hz a value is computed by the code as the code inside of the timed loop only needs one iteration. Or am I missunderstanding your question?

kind regards

Slev1n

Christoph_D · ‎10-01-2015

Hi Slev1n,

I cant speak for the calculation you are performing, but if you experience a difference in times perhaps the simulated tick count does not match. Perhaps you can do your analysis with the Desktop Execution Node:

Using the LabVIEW FPGA Desktop Execution Node - National Instruments
http://www.ni.com/white-paper/51859/en/

Otherwise you need to provide more information on how you are proceeding, what your measurements are and what you are comparing.

Best regards,

Christoph

Staff Applications Engineer
National Instruments
Certified LabVIEW Developer (CLD), Certified LabVIEW Embedded Systems Developer (CLED)

Don't forget Kudos for Good Answers, and Mark a solution if your problem is solved

Slev1n · ‎10-01-2015

Hey,

I am not sure what further informations you need. I try to compare the step response of a moving average with an exponential moving average (EWMA). Actually I just want to confirm the theory. As I mentioned above to get a time constant of 2s at a sampling rate of 94Hz, alpha has to be 0,00169. The rise time of the step response from 10% to 90% of the final value differs from theory. rise time should be 4,4s with time constant 2s but I get almost 14s if I run my code on the FPGA.

I confirmed, that with alpha = 0,00169, my code takes 1297samples to get from 0,1 to 0,9 (final value is 1, start value 0).

As you can see in my code I check the loop time with the indicator "ticks ewma" to confirm the sampling rate of the SCTL.

Can someone else confirm the 1297samples which are needed at alpha = 0,00169? Cause I think, that I need too many samples to reach the 0,9 value.

I already implemented the suggested EWMA version from the first reply. The same problem here...

kind regards

Slev1n

ZX81 · ‎10-01-2015

@Slev1n wrote:

...

1, I proof my sampling frequency with the loop timer. My input are 425.532 ticks which is equal to ~94 Hz. This tickrate is confirmed by "ticks EWMA".

--> Maybe someone can test the code and tell me?

2, I found your approach in the "tricks and tipps" section of Lyons book. I will have a try, but could you explain the round off drifting a little bit? I am quite new in this area.

Is there a further benefit from eliminating one multiplier except ressources? Are the frequency response, impulse response and step response the same?

3, If I only bitshift, I am kind a used to this method 🙂 Not sure if the "reinterprate" function uses less ressources. But thanks for noting it.

4, The timed loop iterates every 425.532 ticks one time. So with a frequency of 94Hz a value is computed by the code as the code inside of the timed loop only needs one iteration. Or am I missunderstanding your question?

I used a spreadsheet to simulate, and get almost exactly the same response (1299 cycles to go from 0.1 to 0.9). Spreadsheets make a handy tool for testing calculations.

1. Okay. I've never used the 'Single-Cycle-Timed-Loop' (SCTL) with the [T] written to the stop. It would force the math functions to be single-cycle, but I'm not sure if that is any advantage. I just wanted to make sure the time was confirmed, and it is.

2. The round-off drifting probably won't show up unless your input is small (less than 0.1). I see now that you have 40 bits (39 right of the decimal) for the feedback. That takes quite a bit of FPGA to multiply, but won't have round-off issues. Other parts only had 18 bits (17 right of the decimal), so alpha (0.00169 +/- .000007) times an input of 0.1 would have been 0.000169 +/- 0.000007, or 7% error). But that multiply is also 40 bit, so you shouldn't see any problems.

Typically, the output y[n] has fewer bits, and will round off at the last bit. But because it is in a loop multiplying by 1-alpha each time, the round-off is sometimes accumulates each loop untill it is large enough to affect the add's results. It's hard to explain, but my general rule of thumb is that I expect an error equal to the smallest bit divided by alpha, using the original method, or about half that usint the one-multiply method.

The responses will be almost identical, except for a small % difference. The biggest advantage is saving FPGA space (and compile time). And you can reduce your number of bits quite a bit to save even more.

3. They are basically identical. And both methods are 'free' in FPGA. The bits aren't changed, so no logic is needed; they are simply relabeled.

4. I think you answered it well.

Generally, at this point, I would adjust alpha till my results matched what I wanted, and move on. I hate not understanding a mismatch, but don't usually have time to dive into it.

But, for the sake of science, let's consider that your formula may be flawed. I think you may be using a formula for a continuous exponential decay (e^-t/tau), not for a discrete exponential decay ((1-alpha)^i). It's easier to look at this as a step function from 1 to 0. In that case, y[n] (for n>=0) is y[n] = (1-alpha)^(n). We can find n for y[n] = 0.9, as n=log[1-alpha](0.9)=62, and n for y[n] = 0.1, as 1361, for a difference of 1299.

___________________
CLD, CPI; User since rev 8.6.

Slev1n · ‎10-02-2015

Hey ZX81,

thank you for your detailed answer.

Concerning the issue with the rise time, I think I found the error. You might be right that the formula is not correct, or what is more probably misunderstood by me and set in the wrong context.

When I was cycling home from work I remembered a handy function of labview: "smoothing filter coefficients.vi" Here you only have to set tau/TC and fs and it calculates nominator and denominator for exponential moving average and moving average. As the nominator is alpha I could compare the result to the formula I used and there was quite a difference. Labview uses the following formula: alpha=1-exp(-1/(fs*TC)). With this formula TC=2s is equal to alpha=0,0053.

And with this alpha my simulation works! Risetime 4,4s 🙂

Quoting you: "Generally, at this point, I would adjust alpha till my results matched what I wanted, and move on." I would love to do the same, but as this is my master thesis I have to solve such things 🙂

Now back to the rounding issues. I understand, that small values are a bigger problem. As this filter is used in a Lock In, the values are going to be REALLY small. But I already tested it on our measuring device and it works, therefor I am going to test your version as well, but if I dont get problems, I guess I keep it at 40bits. Simulating the following setup caused an error of 2.3 %. Using 57 bits reduced the error to under 1%. I think 40bits should be enough.

alpha=0,000335693

input=1,19209E-7

And regarding the ressources I have no worries. Although using a myrio in the end I still have a lot of DSP Slices for the multiplication and 10% free FlipFlops.

So I guess this topic is solved. Thanks for your great help and interesting thoughts.

kind regards

Slev1n

ZX81 · ‎10-02-2015

Cool! I'm glad it's working, now.

I grew up in the era with no DSP slices in FPGAs, and smaller cell counts, so still tend to think in those terms. I still prefer to spend 25 minutes programming to get my compile times down, though. I've had cases where I cut compile time from 90 minutes to 45 minutes by optimizing quite a bit. With a powerful server for compiling, that's less important.

One of those optimizations is to reduce bit counts where I can, especially for multiplies. For example, alpha is +/16/0, and for 0.0053, you could also use +/12/-4 (negative integer count). You may also be able to eliminate a lot of upper bits from your input. 5 minutes to pick the smallest bit-count can easily save 2-10 minutes for every compile.

My second optimization is to reduce multiplies, but with a DSP slice, that's not that important. I can't find good documentation about the DSP slices (if you have some, please post links), but as I understand it, if you multiply larger numbers (bit counts), it needs multiple slices, and maybe time to combine the results.

And one more trick: pick an alpha with a simple binary value, like 1/256 (you picked about 1/189), and change fs until you get the smoothing you want. Then use a constant for alpha. Multiply by a constant 1/256 is free in the FPGA (it just shifts the bits).

For that matter, making alpha constant may optimize the multiplies quite a bit. Depending on the smarts of the optimizer, it may change it to a set of adders instead. Front panel inputs are great for getting things to work, but constants optimize MUCH better.

___________________
CLD, CPI; User since rev 8.6.

Slev1n · ‎10-05-2015

Hey,

here is one link to a user guide for a DSPE48 Slice: http://www.xilinx.com/support/documentation/user_guides/ug193.pdf

LabVIEW itsself is usually forwarding you to this page.

The trick using 1/2^m (m is an integer) is a good idea but as I want to be able to set the EWMA to any time constat I will keep the 16bit fixpoint version. In the book "Understanding digital signal processing" a multiplier free version is possible if applying your trick.

Now I have a philosophical question:

If I have a higher sample frequency (fs) and the same alpha as before at a lower fs, the rise time is smaller, but the time constant is smaller, too (cut off frequency fc is higher).

So is there any disadvantage if I do not downsample my signal before letting it run through my EWMA? Of course, the hardware must be able to process the code, but 48kHz should be easily processable for the FPGA.

kind regards

Slev1n

ZX81 · ‎10-05-2015

Good question.

If you average 16 times as many samples (fs = 16x what it was), you should include 4 more bits in your feedback. You already have pleanty, so that may not be important unless you go much faster. Otherwise, increasing fs is probably good.

If the input has low-frequency noise, over sampling doesn't help eliminate that at all. High-frequency noise, though, does reduce with over-sampling. If, for example, the noise above 10Hz is -5dB (that is 10^-.5 times the amplitude of the signal you like), and you sample at 20S/s, you will probably pick up -5dB in your initial readings. If your -3dB (fc) is also 10Hz, then you'll end up with around -8dB noise left in your signal. If you instead take 200S/s, average groups of 10, then pass those averages to the filter, you won't help noise at 10Hz (you were measuring 10Hz noise with no sampling effects), but will reduce noise above 100Hz by about a factor of close to (but not really) 10.

There are entire semester-long classes that discuss why, how, etc. The short version is this: Each sample is the sum of the signal you want and noise. If you add 10 samples, you get 10x the signal you want, and the sum of 10 noise. The nature of the noise determines what you get when you add the 10 samples of noise. Gaussian noise adds one way (something like: if 83% of samples are below X, the sum has 83% sums below 1.1X, or something like that). Linear noise adds another way. And repeating patterns add another way. So, without knowing exactly what the noise is, no one can answer you with certainty, except that averaging multiple samples probably helps, and almost never hurts.

There is also the issue of aliasing. If you have a sine interferance of 60Hz, at -3dB, and you sample at 10.001S/s (always assume the clocks wont' match presicely), you will get something like 0.006Hz at -3dB added to your signal, and your filter won't remove it. But bumping your sample rate to 100.001S/s, will put the interference at about 40Hz, so your filter should eliminate it.

Averaging 10 samples at a time is a type of filter ("box"). If you look at it in a frequency domain, you can see that some higher frequencies get shifted to lower frequencies in an odd way, and not all are reduced. If you average 4000 S/s, 100 at a time, you'll get an average 40 times per second. With 60Hz interference, you will get about 1/3 as much noise, shifted to 20Hz, which won't filter as well as 60Hz would have.

So, it would be better to use the EWMA filter at the higher sample rate, than to average blocks of inputs, then filter that. And averaging is (probably) better than just using a slower sample rate.

If you have an input adapter with built-in electronic filters, that's even better, and there is no need to sample more than 2X the filter's frequency.

___________________
CLD, CPI; User since rev 8.6.

LabVIEW

exponential moving average step response fpga

exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga

Re: exponential moving average step response fpga