LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FIFO issue in continuous streaming data from host to target FPGA

I'm sorry, I must not have been clear. The pipelining won't help you read faster from the DMA FIFO, and you're already able to meet the 100mhz timing with that code, so there's no reason to put a feedback node there. (If you do put one there, however, then you should also put one on the Timeout output.)

 

Pipelining might help you achieve 100Mhz timing in the case where you generate the signal on the FPGA. Shift registers and feedback nodes are interchangeable and the underlying implementation is identical, you can use either one.

 

If you want to read only on every other cycle, split the 32-bit value from the DMA FIFO. Use 16 bits immediately to set the outputs; store the other 16 bits in a shift register (or feedback node). Set a boolean shift register/feedback node to indicate that you don't need to read from the FIFO on the next cycle. Put the DMA FIFO read in a case structure; when the boolean from the previous cycle is set, use the shift register/feedback node value instead of reading a value from the DMA FIFO. Probably easiest to use false to indicate that you do need to read from the FIFO on the next iteration, that way you can use the Timeout value to indicate if a read is necessary on the next cycle (if it's true, you want to do the next read immediately; if it's false, use the other half the data from the previous cycle).

0 Kudos
Message 31 of 42
(2,604 Views)

understood. I will try it out

 

Yes. I knew pipelining helps to avoid getting timing violations at higher rates. It is still not very clear to me how can I use pipelining.

 

For example below is the original way of creating and outputting the patterns in FPGA ( simplified ) . You can see below the chain of subVIs and how they are related to each other.

Is it okay to put a feedback node for each black cross to pipeline the process ? and then can we say the depth of the pipeline is 4?

Untitleddew.png

0 Kudos
Message 32 of 42
(2,583 Views)

The idea of pipelining is to break your code into sections that can execute within a single clock cycle. It takes longer (more cycles) from when you start the pipeline to the first output, but after that, since all the pipeline stages execute in parallel, you get your data out at full speed.

 

I would start by adding registers one stage at a time until you meet your timing goals. Your "simplified" diagram doesn't make much sense, there's no reason to put a feedback node everywhere you could possibly put one - most likely you'll use more resources than necessary for no benefit. For example, the equal comparison probably doesn't require an entire clock cycle for itself, so why put a register on both the input and the output of it? But maybe your "equal" is actually a more complicated comparison, I don't know.

 

How much of that code is in a single-cycle timed loop? Which parts need to stay synchronized? (Remember, adding a register adds a one-cycle delay, so if you a block with two outputs and you only delay one of them, the outputs will no longer match). What's the point of the delay before the FIFO out?

 

I finally got access to a machine with LabVIEW 2013 and tried to take a look at your code, but it's a mess and missing a lot of subVIs, so I still can't tell what you're trying to do.

0 Kudos
Message 33 of 42
(2,565 Views)

Nathand,

 

Firstly, I tried U32 with a design very close to you mentioned and it could double my speed as you can see in the attached screenshots. However, it could reduce the time from3.3ms to around 1.6ms and the target is 680ms. Still because of time out issue , the engine should wait until the data is available and it delays the outputting on IOs. I can try U64 instead but it may make the target code more complicated

 

Thanks. I got the idea behind pipelining. So I should group my subVIs and then pipeline them to make sure 1: they stay synchronized 2: more resource may needed because each register takes 1 clock cycle

 

I also attached a more cleaner version. The FPGA VI is called Target FPGA and all the SubVIs are there. I would greatly appreciate if you could quickly look at them 

 

 

Download All
0 Kudos
Message 34 of 42
(2,552 Views)

tintin_99 wrote:

Thanks. I got the idea behind pipelining. So I should group my subVIs and then pipeline them to make sure 1: they stay synchronized 2: more resource may needed because each register takes 1 clock cycle


This isn't quite right. Each pipeline stage requires 1 clock cycle. The register on its own does not require a clock cycle, it just transfers data from one pipeline stage to the next. The registers between pipeline stages do consume some resources, of course, so there's no reason to put more registers than necessary to meet your timing goals. By "register" here I mean either a shift register or a feedback node.


tintin_99 wrote:

I also attached a more cleaner version. The FPGA VI is called Target FPGA and all the SubVIs are there. I would greatly appreciate if you could quickly look at them 


This code still needs a lot of cleanup. You still have wires going odd directions, wires hidden behind structures, and huge block diagrams with wasted space. None of these affect performance but they make the code difficult to read and understand.

 

You also have a lot of poor logic. Anywhere you have a Select node with a constant boolean input, you should use a boolean logic function. For example, a Select with a constant True wired to the True input is an Or. Likewise, there is no need to compare a boolean with a true or a false constant. I also saw one spot where you fork a wire and then put a feedback node on each branch, but you only need one feedback node. These are the most obvious examples; I suspect if I dug into your code more I'd find further opportunities to simplify.

 

I still don't know what your code is supposed to do, and maybe you can't provide a detailed specification, but I would suggest that you break up your code and separate the input and output portions if you can. Try to get just the signal generation portion working in a way that meets your timing goals, then add the input and comparison portions. These might even be separate loops, passing the data through a FIFO (I'm not saying that this is definitely better, but it could be an option).

0 Kudos
Message 35 of 42
(2,525 Views)

nathand,

 

The attached code should be much easier to read and understand. I tried to briefly describe the functionality of each function. I also made more SunVIs and also replaced the select cases with Boolean logics

 

Basically, I have a big array of U32 values. Each value holds , timing values + output values+ IO should be input or output array + expected results for data read back+ mask for data read back

These are either a Boolean value or Boolean array and I capsulated all in 1 U32 value. FPGA should read these number on cycle_clk rate ( not FPGA frequency ) and then encapsulate , generate the pattern for IOs and then strobe to read the DUT response and compare it with expected result.

I would greatly appreciate your help to simplify this code

And I couldn’t find the one you mentioned about using feedback nodes on each branch

0 Kudos
Message 36 of 42
(2,507 Views)

I can't take a look at this until later when I have access to LabVIEW 2013. Before I do, though: does this code work? Does it meet your timing needs? If not, how far off are you on the timing? Can you show a screenshot of where the timing violation occurs? Have you already tried pipelining? I do not have the time to understand and optimize your code (unless you'd like to pay me), and although I've done a fair amount of LabVIEW FPGA programming, without compiling your code repeatedly I can only make guesses (not provide definite answers) as to what will make the timing work.

0 Kudos
Message 37 of 42
(2,494 Views)

 This code works and I can compile it up to 70MHz. But if I change the clock rate to 100 MHz ( my target ) then I get timing violation errors. I attached the screenshot of timing violations with pipeline and without pipeline. The code I attached in my previous email is without pipeline

 

I completely undemand . It would be great if you quickly check it . I am new to FPGA programming and I want to make sure in overall the code is FPGA efficient.

 

Download All
0 Kudos
Message 38 of 42
(2,483 Views)

Beside pipelining , I also removed the delay VI and changed num of cycles from U32 to U16  and by doing all those changes I get 0.79ns timing violation

 

 

0 Kudos
Message 39 of 42
(2,481 Views)

If your timing violations are all non-diagram components, I'm not sure what to do to resolve that, you'll have to experiment. Try removing parts of your code until it works - for example, remove the comparison section. Maybe if you can improve the code, it will free up timing for the non-diagram components, I'm not sure.

 

There's still room to improve your logic, here's one example:

SimplerCode.png

 

I'd also suggest that where you have several signals that all flip on the same clock cycle (for example here, in the Signal Generator):

SameClockInput.png

you should move the logic that checks for the clock edge out of the subVI, and up to this level, so that you only do that calculation once instead of 3 times. That may help reduce fan-out, unless the compiler is smart enough to identify the duplicate code and coalesce it.

 

I don't quite understand your signal generation scheme, maybe I could figure it out if I looked at it long enough. It appears that you can generate one of several different patterns, and for each pattern you're varying several parameters. I wonder if there's a way to do this with a lookup table instead, rather than all the in-range and coerce functions. Depending on the total number of patterns that you use, you might have essentially a 2-D lookup table, where the inputs are the pattern and the count, and perhaps you have an offset for the count as well if you use the same pattern but offset.

 

It would still help your readability to straighten out all the messy wires and generally clean up the code. Another such example: in "csb.vi" you wire Counter into the case structure twice (through two tunnels) - get rid of one of them, and branch the wire inside the case structure.

0 Kudos
Message 40 of 42
(2,406 Views)