Tx Underflow details

cochenob · ‎02-05-2013

I...like a few others who have previously posted...have been running into the "Tx Underflow" error for high sample rates (>1-2MS/s). A few questions based on previous suggestions/solutions...

1. It seems like the most popular fix had something to do with changing/creating a registry key. Following the instruction here and in the documentation, I created this key with the suggested value, but it didn't seem to fix the problem. That said...I haven't seen any explanation of exactly what this does, or why changing the value is supposed to help. Could some explain what exactly we're doing here and what this key means?

2. My initial thought was that the "Write" vi operated essentially opposite of the "Read" vi. For reading...you initiate transfer of data from the radio to the PC...the PC puts packets in memory...then you read/fetch those packets out of memory. Can we assume that the writing process works the same way but in reverse....where you write a bunch of data to memory, and then the driver accesses those memory locations and sends packets to the radio? Is this right?

3. If I'm correct in understanding the above, then it makes sense to delay the Tx start trigger (which I sort of the opposite of the Rx "Initiate"?). I tried this...delaying the Tx start trigger by several seconds in order to get a big 'head start' in writing my data to memory before the radio started asking for it (if this is indeed how this works). Unfortunately, this didn't seem to eliminate the Underflow error either.

Since neither tweaking the registry key (which I don't know exactly what it does) or delaying the start trigger by several seconds seemed to remedy the problem...it became clear that I don't really understand what's going on with the driver during the write process, and was hoping someone might be able to offer a clearer picture that might help me debug my application a bit more. Any insight?

Thanks again for all the helpful discussion!

---

Brandon

JoelWhittC · ‎02-11-2013

Hello Brandon,

This is straight from the NI USRP 1.0 readme:

Data Streaming Performance Tips

If you are getting underflows, overflows, or other errors when sending or receiving data at high IQ rates, the following tips may improve performance:

Set the FastSendDatagramThreshold registry key to a value ≥ 1500. Refer to the Registry Keys appendix of Optimizing Windows Media Services at www.microsoft.com for more information about registry keys.
Increase the number of samples given per Write call (for Tx) or the number of samples requested per Fetch call (for Rx). To improve throughput, set the number of samples per call to a multiple of 360 to ensure that all packets are of maximum size.
Separate data streaming and data processing or data presentation operations into multiple threads. If a loop continuously writes data to or fetches data from the device, create a separate thread for processing operations (such as spectral analysis with a FFT), and a separate thread for presentation operations (such as displaying a waveform on a graph). LabVIEW queues are a convenient way to pass data safely between threads.
(Tx only) If possible, configure a Time Start Trigger for a small amount of time in the future, then start writing data. This allows the hardware buffers to fill with data before the generation starts, which reduces the likelihood of data underflows.

There is an explanation of what the registry key does in the Optimizing Windows Media Services link above.

I did notice that this section is not included in the NI-USRP 1.2 readme though I'm not sure if that means improvements were made in that regard or not. What version of NI USRP are you using?

What are the specs of the NIC you are using? I'm wondering if the NIC might be the bottleneck in your system.

Thanks,

Joel C

National Instruments

cochenob · ‎02-12-2013

As mentioned....I already changed the FastSendDatagramThreshold, but this did not seem to help. I'm using an NI-8105 embedded controller in an NI-1042 PXI chassis.

I read somewhere that this issue (and maybe other associated ones) were going to be addressed automatically with 1.2....I'm still rolling with 1.1....so I will update and retry.

Trying to brush up on my UDP protocol to understand this registry value a little better....but from what I gather...if the driver presents the network hardware packets greater than a certain size...it undergoes a bunch of extra overhead (that I don't quite understand). If less than a certain size, it's able to push it out more immediately. I suppose that's a good enough explaination.

However....if I have a huge set of data I want to transmit (like a second's worth of IQ data at a few MS/s)...it may not help increasing this registry value by a factor of 2? Would it be better to present the transmit samples in chunks? Or does the driver handle that?

---

BC

JoelWhittC · ‎02-13-2013

Hello Brandon,

Did the upgrade to NI USRP 1.2 help?

Can you post your code? I can try to recreate the error here.

Thanks,

Joel C

National Instruments

Anthony_F · ‎02-14-2013

@cochenob wrote:

Trying to brush up on my UDP protocol to understand this registry value a little better....but from what I gather...if the driver presents the network hardware packets greater than a certain size...it undergoes a bunch of extra overhead (that I don't quite understand). If less than a certain size, it's able to push it out more immediately. I suppose that's a good enough explaination.

Much like when your hard drive doesn't have enough contiguous free space and files get "fragmented," a similar phenomenon happens over the internet. Chunks of data get sent out in the form of a packets. If these packets are too large, they must be broken up into smaller peices and reassembled at the destination (extra overhead required).

Worst case scenario is that one extra byte causes a whole new packet to be sent out. According to one source, that extra packet would contain at least 47 bytes, only 1 of them being useable data at the end.

Ideally, you want to send as much data per packet as possible without fragmentation to lower your overhead. This is where MTU comes in (explained slightly in the stackoverflow link). This also touches on why you can't get 1 GB of data over a 1 GB connection. You need that overhead to make sure things get to where they need to go. Some busses require more overhead than others.

Anthony F.
Staff Software Engineer
National Instruments

cochenob · ‎02-14-2013

Hi Anthony-

These were great resources, thank you.

A few follow-up questions...

1. It seems strange though that the choice as to what the MTU is (i.e. - the value that I'm changing in the registry) can be arbitrarily set. It would seem like this would be standardized. Maybe not? Being rusty on my IP standards, does it allow for the flexibility to have the "data" portion of each packet be arbitrary? Assuming data transfer between only two devices, perhaps the "data" portion of the packet can be set as big or little as desired, so long as each device is capable and agrees up this before hand?

2. What units is this Datagram Threshold in? Bytes? In order to simulate my sesnor, a back of the envelope calculation tells me I need to generate about a 1 second's worth of continuous data at an IQ rate north of 1-2MS/s....or at least 4MB of data. I'm guessing this is going to be a non-starter if the suggested value of the threshold (2048) is in bytes. (Though to be honest, it's becoming a non-starter in my LV code before I even get to trying to send it to the radio.)

3. If my thinking in the first two is correct...can I assume that the maximum MTU size is probably going to be determined by the USRP rather than the PC. If so, what is this number? In other words...how big can I make my datagram? I could then work backwards from this number to reliably determine what combination of data length and sample rate falls within a single transmission. In other words, I might not be able to get my 1 seconds worth of data at >1MS/s....but knowing exactly how I could modify my simulation to fit within the abilities of the radio would be a great tradeoff. The USRP documentation says something vague, like, "Make this number 2048 instead," but without any indication as to why it was suggested, or what the limit might be.

Thanks! This was helpful!

---

Brandon

Anthony_F · ‎02-14-2013

Microsoft recommends setting the registry key to 1500. This is also what we suggest in our help file documentation
I would believe those units to be bytes. I would double check the streaming optimization webpage from Microsoft to be sure.
Yes, the USRP chooses a default size of 1472. There is no need to change this or do any calculations and it is not recommended.

Sure there's a tweak you can do because Microsoft provides it, but In the end, there doesn't need to be too much thought placed into these things. Choose a sample rate suitable for the bandwidth you need and go with that 😉

If you're not getting the performance you expect, I would try a different computer or different network adapter as this is most likely the root cause.

Anthony F.
Staff Software Engineer
National Instruments

cochenob · ‎02-14-2013

Anthony-

Maybe I don't understand....if the USRP has a default size of 1472 bytes...then why bother making changes on the PC end above this number? Furthermore, assuming 16-bit IQ pairs, 1472 bytes is only 368 samples. I would guess that a large number of applications are going have more samples than this, meaning that the majority of the time, you're going to need to send several packets to the USRP rather than one big one. (That said...I can at least see where the, "make your fetches/writes a multiple of 368," recommendation in the documentation comes from.)

So...if the USRP is only handling 1472-byte (368 samples) sized chunks, it seems like just about every application is going to have fragmentation that will require the transmission of multiple packets. I don't see how changing the PC threshold to something higher than this accomplishes anything when the MTU is set by the USRP, and is lower than what the Windows default already is.

Where am I going wrong?

Anthony_F · ‎02-14-2013

@cochenob wrote:

Anthony-

Maybe I don't understand....if the USRP has a default size of 1472 bytes...then why bother making changes on the PC end above this number?

The Microsoft default is 1024 which will cause a single packet to be split into two packets (fragmented)

@cochenob wrote:

Anthony-

it seems like just about every application is going to have fragmentation that will require the transmission of multiple packets.

Yes, it is not feasible to put all data into one packet. We can only maximize the amout of data per packet directly, not the number of packets.

Is it possible to post your VI so we can inspect and test out the code? It will may be easily seen what the problem is by inspection.

Anthony F.
Staff Software Engineer
National Instruments

cochenob · ‎02-14-2013

You bet. Give me some time to get it into a state that would be edible for another pair of eyes, and I'll pass it along.

I think I understand that if the PC threshold was set at 1024 byte chunks, and the USRP can handle 1472 byte chunks, then you'd always been 'under-utilizing' the rate at which you're sending data to the radio since you're not sending full packets (and why you recommend sending N*368 samples). It would then make sense that as you head towards higher output sample rates, you could skirt underflow errors by increasing the PC datagram threshold so that you'd send full (and less total) packets.

That said...we're only talking about 400 bytes (or 100 IQ pairs) at a time here. Doesn't seem like a real big deal all considering...but I probably don't have an appreciation for all the subtlties involved.

Stay tuned for code.

USRP Software Radio