08-20-2013 11:34 AM
Hi Everyone,
I am trying to optimize my LV FPGA implementation as I have hit the 90% device utiliztion mark and I need to come down to atleast 80% to save room for future bug fixes and new features for my system. After doing some reading online, it is clear that LV FPGA doesnt like front panel arrays very much (uses too much space for them) and an implementation that utilizes block memory instead of arrays is recommended (http://digital.ni.com/public.nsf/allkb/311C18E2D635FA338625714700664816). My question is whether this recommendation is restricted to the front panel controls on the parent FPGA vi, or for subVis as well?
Within my subVIs I have large arrays being passed from one subVI to the next for various processing operation, hence I have a lot of front panel array controls/indicators on the subVIs, however on my parent VI I only have a couple of front panel controls that I use to initialize the FPGA from the RT Host VI, then I pass the data back to the HostVI using DMA (from within one of my subVIs).
On a seperate note, can anyone point me to an article that explains the difference between the various metrics LV compilation returns as part of the device utiliztion (map) (see screenshot). Am I supposed to be more concerened about the amount of total slices used or the slice registers/Luts used? I read in multiple places that the device utilization can be a bit misleading since LV will not try to optimize the implementation on FPGA unless it runs out of space, what is the threshold for this optimization to get triggered? Also I am building my FPGA Vi with timing optimization, so that would cause a bit of increase in space used if am correct.
Thank you,
Aws
08-20-2013 01:17 PM
@Aws_Khudhair wrote:
After doing some reading online, it is clear that LV FPGA doesnt like front panel arrays very much (uses too much space for them) and an implementation that utilizes block memory instead of arrays is recommended (http://digital.ni.com/public.nsf/allkb/311C18E2D635FA338625714700664816). My question is whether this recommendation is restricted to the front panel controls on the parent FPGA vi, or for subVis as well?
This applies only to the top-level VI. The utilization issue is due to the way the FPGA and host share data through front-panel controls. Since front-panel items of FPGA subVIs aren't exposed to the host, they are not as expensive. In fact, when you compile a top-level FPGA VI, all of its subVIs (except ones that are not reentrant) are effectively inlined or flattened into the top-level to create one large VI with the same logic. That said, moving large arrays around the FPGA is still not very efficient.
Aws_Khudhair wrote:
On a seperate note, can anyone point me to an article that explains the difference between the various metrics LV compilation returns as part of the device utiliztion (map) (see screenshot). Am I supposed to be more concerened about the amount of total slices used or the slice registers/Luts used? I read in multiple places that the device utilization can be a bit misleading since LV will not try to optimize the implementation on FPGA unless it runs out of space, what is the threshold for this optimization to get triggered? Also I am building my FPGA Vi with timing optimization, so that would cause a bit of increase in space used if am correct.
My understanding is that each SLICE contains some registers (logic) and some look-up tables. Many slices may be used for only one or the other. The compiler will try to group related logic and lookup tables into the same slice, but it may not always be able to do so. I would worry more about overall slices than the lookup tables and registers individually, unless for some reason they're really out of balance.
It's not so much a compiler optimization as additional compiler effort when the FPGA starts to fill. My (very simplified) understanding is that it's a little bit like packing a suitcase of odd-shaped objects. If you have a large suitcase and only a few objects, you can throw everything into the suitcase and as long everything fits, then you're all set. If you have more objects, you might have to spend some time making them to fit together neatly. At some point, no matter how cleverly you arrange them, you just can't fit all the objects into the suitcase. That's the point at which your design doesn't fit onto the FPGA. In some situations you can have high FPGA utilization and still add more logic at the cost of additional compilation time; you'll see the number of registers and lookup-tables increase, but not the overall number of slices. It's not really an optimization, it's more that the compiler tries more and more possible combinations until it finds one that works.