11-21-2013 02:15 AM
I need to write code that produces a pulse train/PWM with a variable duty cycle per pulse; This feeds to a serial line on a chip that reads a bit as high when the duty cycle is 80%, and low when it's 52%. So the good part is that it's not infitely variable.
The code I came up with is attached; however, this code (along with the code that calls it), requires well over 125% of the FPGA.
I have narrowed the problem area down to the case that passes the tick counts. (If I replace the CASE with just static constants, the utilization is around 20%. I have also tried the Select node from the Comparison palette, and encasing the entire pulse generation code structure in a CASE, with the same results. Any time the delays are not constant, I go way over.
11-21-2013 11:20 AM
I suspect that the nested FOR loops are part of the problem, and possibly the reuse of the same IO channel if you have enabled arbitration for that IO channel. Have you considered whether you can rewrite this as one single-cycle timed loop? If you need help with that, post actual code instead of an image and I'll take a shot at reworking it.
11-21-2013 11:44 AM
I'll have access to the code later today, and I'll post it.
The IO channel is set to never arbitrate, and there doesn't seem to be any problem with the IO; all I have to do is eliminate the CASE on the timing integers (which feed the delays), and it compiles perfectly well.
I have done plenty of input and data storage with these units, but never set up anything more than simple PWMs. This is a weird thing for me to see.
11-21-2013 01:12 PM
I don't know how that LabVIEW code is being converted to hardware, but there is a possibility that what you're creating here is a relatively large 160x8 lookup table entirely within FPGA logic, which is not very efficient. Since the arrays are fixed-size (and constant in your image, although maybe that's not the case in real code) the compiler can precompute the case structure outputs for every single for loop iteration and store them in a lookup table, so that when the code executes there's no evaluation, it's just a lookup of the two for loop indices. I do not have enough knowledge of FPGAs to know if that's likely to happen but it might explain what you're seeing. You could prove it by varying the number of iterations and see if that scales with FPGA utilization.
11-24-2013 11:20 PM
For anyone who wants to give this a shot, here is stripped down code that doesn't require all the other libraries.
This latest version creates a HUGE lookup table, and so it fails miserably.
If I switch to a WHILE loop, will the compiler skip the LUT?
11-25-2013 01:26 PM
I don't quite understand your VI. Why do you have the timeout on the DMA Read set to "1"? Can you do the DMA Read in parallel with the sort-of PWM?
For unknown reasons I can't get this to compile (it starts the compile, it even tells me that it successfully completed estimating resource utilization - but then it doesn't show the results) and I don't have time to fight with it right now, but I would try something like the coe shown below, which replaces the for loop with a single-cycle timed loop.
11-25-2013 01:54 PM
@nathand wrote:
I don't quite understand your VI. Why do you have the timeout on the DMA Read set to "1"? Can you do the DMA Read in parallel with the sort-of PWM?
For unknown reasons I can't get this to compile (it starts the compile, it even tells me that it successfully completed estimating resource utilization - but then it doesn't show the results) and I don't have time to fight with it right now, but I would try something like the coe shown below, which replaces the for loop with a single-cycle timed loop.
I appreciate you trying to get it to compile.
For what I am doing, it makes sense to have the DMA read take place first, then perform the output (the DMA is being loaded in bursts). The 1 is an arbitrary choice for testing until I get this up and running and can test some scenarios with it.
Your suggestion of a single cycle timed loop might work; but why would the compiler treat this one differently, in terms of a lookup table?
After doing a bit more thinking about it, I will probably change the FIFO from a U8 to a [BOOL] (TO no longer becomes an issue at this level) and have the U8 array converted to BOOLs down on the RT side; I'll then look into implementing the look like you said.
11-25-2013 02:10 PM
Thinking about it more, the 8192-element boolean array might be the problem. Consider using a memory block instead. A large array like that isn't very efficient on FPGA, and the array might be the reason for the excessive lookup-table use.
Be careful about changing the FIFO data type if you care about throughput. I did some testing of U8 versus U32 and found that I could transfer the same number of elements per second for either data type, meaning in terms of throughput the U32 was 4x slower. If you change to bool, it will be 8x slower, although maybe that's acceptable in your application.
I recommend using a timeout of either -1, so you always get all data, or 0, in which case the loop speed stays consistent, but that's mostly my personal preference. A timeout of "1" does seem very arbitrary.
If you get a chance, see if it compiles, or at least can estimate resources, on your machine. The single-cycle loop should be a tiny amount of FPGA space, but again, the large boolean array is quite possibly the problem.
11-25-2013 02:16 PM
Hmmm...
I suppose I could keep the FIFO scalar, but then use the FIFO enable/disable controls down on the RT side to pump it full of data. Meanwhile, I include the FIFO Read with a -1 TO on the FPGA side INSIDE the loop so there's no lookup table. That is, the presence of data in the FIFO drives the output loop.
Seems like it would solve the LUT problem, as well as make the FPGA code far simpler.
11-26-2013 01:21 PM
Looks like a memory block is the way to go if you want to reduce resources without rewriting. The code below fits easily and is only slightly different than your original.