04-02-2014 09:19 AM
I have a nice modular FPGA architecture running with Registers as an interface between modules.
I also have some nice high-speed loops going on to multiplex functions on limited hardware.
My problems occur due to apparent throughput problems with registers. I have a loop producing a signal @ 120MHz, the results of which are split up over three distinct registers to a loop which reads all three registers and outputs the data @ 40MHz. I'm essentially creating three distinct sine waves.
Although the code ocmpiles, I see that I am not receiving all new values. Only every third value I am receiving is actually correct (See image).
This should be a sine wave. Testing the software in simulation mode does not result in these artefacts. Needless to say, the PSD of this sine wave is not very nice.
It seems that a register requires some clock cycles to actually pass data around. This results in some kind of blocking mechanism which is currently holding back my program on a very fundamental level. It took me a few days to find out that THIS was the problem which was causing my distortions on my output signal.
I tried increasing the top-level clock of the target from 40MHz to 120MHz to see if the propagation of the register is a function of that clock, but the effect remains the same. Is there any way to get around this without either
Shane
04-02-2014 09:38 AM
I just found this little gem buried in the LabVIEW help (in the "Implementing multiple Clock Domains" help, not the register help).
You must wait several clock cycles of both the writing and reading domains to handshake a new value.
Great. I need a completely new architecture.
Can I file this as a bug that this information is not present on the help page for registers?
Shane
04-02-2014 09:55 AM
For this particular example a FIFO (perhaps a very shallow one) is probably a better option since you are going across clock domains. A FIFO is optimized to stream values between clock domains without the "hiccups" you are trying to avoid.
The register implementation does not get optimized for related clock domains (two clock domains where one is a multiple of the other) so the logic behind the scenes assumes the clocks are unrelated and has to take several cycles to get the data from writer to reader in a safe way. I don't remember if this feature/optimization has been requested on the FPGA Idea Exchange, but it would be something useful to have on there.
04-02-2014 01:21 PM - edited 04-02-2014 01:22 PM
04-02-2014 01:47 PM
It would be a cool addition to the ability to pass data around at high speeds (>> 10MHz).
It is very common to pass data at rates much faster than this. Are you specifically talking about the multi-clock handshake case you originally mentioned? If so, the rate will always be 2 to 3 clock cycles of slowest clock (which might be where you are seeing the 10 MHz if your slowest clock was 40 MHz). Other transfer mechanisms like FIFOs should hit rates well over 100+ MHz in most cases.
04-03-2014 04:30 AM - edited 04-03-2014 04:31 AM
I'm currently trying to replace my registers with FIFOs (Luckily I had implemented my functions nicely encapsulated).
I'm having trouble with clock domain crossing because Flip-flop implementations of FIFOs apparently don't support reading and writing in different clock domains.....
I'm getting the following message:
The VI Defined FIFO is configured to use an implementation of Flip-Flops but has read and write interfaces in different clock domains. If you want to read and write in different clock domains, select Block Memory from the Implementation pull-down menu int he FIFO Properties dialog box for the VI Defined FIFO.
I have many such FIFOs defined but only one is raising this error...... Using BRAM for my FIFOs would be extremely inefficient since I'm passing only small amounts of data each time.....
Help?
Shane
Ps I need >30 such FIFOs to get things running. This will fill my Block Ram rather quickly since each one will claim a full BRAM block of 36 kb, right?
04-03-2014 04:41 AM - edited 04-03-2014 04:51 AM
@Dragis wrote:
Are you specifically talking about the multi-clock handshake case you originally mentioned? If so, the rate will always be 2 to 3 clock cycles of slowest clock (which might be where you are seeing the 10 MHz if your slowest clock was 40 MHz).
FIFOs allow clock crossing only as Block RAM which is really inefficient for single 64-bit values. Each FIFO would occupy a full 36kb Block RAM with only approx. 0.2% usage.
Would it be theoretically feasible to inerleave multiple registers across the clock boundaries and use this kind of arrangement to overcome the 3-cycle delay? It really feels like a kludge but at the moment, I'm searching for any kind of workable solution.
Shane.
PS There's a new item in LV 2013 and later, a handhake item. Is it possible using this item to FORCE a 1:1 read:write frequency or is the 3 cycle minimum still present? I'm currently using LV 2012 SP1.
@PPS the example picture for Registers int he LV help shows passing data between a 60MHz and a 80MHz loop which, according to my tests will not work properly. This is not mentioned anywhere in the help. The 3 cycle handshake required would in this case run @ 60MHz, allowing a new value to be passed at a rate of 20MHz only....... This is a very misleading example for me.
04-03-2014 08:35 AM
Yep, unfortunately the only FIFO that supports clock crossings is the block RAM FIFO so, at least for now, you'll have to pick between utilizing the block ram inefficiently but getting faster transfer times or dealing with the slower transfer times.
A couple other options you have are to (1) increase the clock rates on the loops to get a faster transfer between domains. This option might require you to pipeline your design a bit more to meet those rates. Or (2) you could create a CLIP and drive the CLIP IO from each side of the loop and internally do a more optimized transfer based on the two clocks you are using. This is a pretty out-of-the-way solution, but it's possible if needed.
Again, please put ideas on the idea exchange for any of these features you want to see and hopefully others will pile on and vote them up.
04-03-2014 08:41 AM - edited 04-03-2014 08:46 AM
The CLIP option occurred to me today. Last resort I think.
I'm still trying interleaving registers, that may work and is a LV-only solution which will probably transport better to new LV versions.
Shane.
PS Is there a clear resource explaining exactly how this transfer works? When using related clocks, is the delay ALWAYS 3 cycles or is there some uncertainty in that? Does the Register have to be READ in each iteration to make progress on the handshake or does the Register only need to be PRESENT in the loop and so forth..
04-03-2014 09:03 AM
The amount of time the transfer takes can change depending on the ratio/relation of the source and destination clocks. However, if the two clocks are phase aligned (both come from same souce clock directly or an integer multiple or divisor derived clock) then the transfer time will always be the same (within some enormously small hardware error margin).
I couldn't find any documentation on this, but that's not surprising since it's an implementation detail and generally those may change between LabVIEW releases as the compiler/platform gets smarter.