BreakPoint

cancel
Showing results for 
Search instead for 
Did you mean: 

Undebugable - least reproducible issues

I've had my share of weird bugs raising their head over the years.  The ones which disappear as soon as you try to find out where they're coming from, or as soon as you call a colleague over to have a look are always the most annoying.

 

What are the most unreproducible (and therefore annoying) bugs you've run into.

 

Mine was as follows:

A random RT VI was being debugged and I suddenly saw the execution time of a simple "divide" primitive started changing it's execution speed (during execution) depending on the values being fed to it (It's best not to ask what kind of benchmarking I was doing to notice something like that).  I'm talking 10x times slower when fed with numbers leading to "possible" rounding errors.  I managed to show it to a colleague but by the time I had reported it in the forums, the problem had gone away most likely laughing at me as it went.  LINK.

 

Anyone else?

0 Kudos
Message 1 of 8
(11,878 Views)

I want to chime in!

I took a LabVIEW project from 8.2 to 2010. In LabVIEW, everything was fine: The update warnings contained minor issues. The entire code worked pretty fine in LabVIEW. Quite soon the time came to build the application and test it.  Actually, the application crashed (I do not remember the exact kind of crash) in one specific scenario - which it never did when produced with 8.2.  It took a good amount of time until I found the reason for it: On the BD there was a section that wrote to two property nodes of one menu ring without any data flow control there ... Adding an error wire between the property nodes fixed the issue.

It was just the compiler improvements of 2010 that made me identify this bug that was in the code since about 400(?) years ...

0 Kudos
Message 2 of 8
(11,865 Views)

A few years back I was working on a program for a client and ran into a wonderfully undebuggable issue - I don't remember if the guy testing the EXE with me saw it first or if I saw it first in LV, but occasionally, LV would produce different results from the same Number to Fractional String primitive, even though it was getting the exact same input. It was weird to see, because changing something in the UI which caused a rerunning of the code without changing the value would then also cause the values in some indicators to change. Do it again, and it would change back. I even managed to create a VI which would demo this once the bug would start happening.

 

Not only was this a super-weird bug in its behavior (and scary, because it made the math come out slightly wrong), but I couldn't figure out what was causing it to manifest. There was no good set of repro steps. It would just appear sometimes and persist until LV was closed. Eventually, someone from NI figured it out - there were calls to .NET functions in the program and under specific execution paths, they were changing the precision of the processor for the thread and failing to change it back, which would then cause LV's calculation to come out wrong. I think the solution was to move all the relevant .NET calls to a separate thread.


___________________
Try to take over the world!
Message 3 of 8
(11,849 Views)

I have a related comment to make, although not directly answering the OP.  It has happened that I witness something odd and spend hours/days debugging my software only to find that some part of the hardware or the UUT isn't responding properly.  The other posts are very interesting, but this type of thing happens often.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

0 Kudos
Message 4 of 8
(11,820 Views)

I'm suffering one right now and it's driving me nuts!!!!

 

In my code I appear to have a memory leak. When I build the code into an executable it's worse (or so it seems).

 

I don't keep re-opening references to things (such as named queues).

I am not hoarding objects or class instances.

 

The worst thing is, as I work through a copy of the code removing small chunks and retesting to determine the culprit, the memory leak will suddenly vanish, only to not reappear when I put back the last bit of code removed! I can then revert to the original full code and remove something somewhere else, and again the memory leak will vanish, only to not come back when I reinstate that code. It's untrackable!!!! 

 

What's the emoticon for crying uncontrollably?

Thoric (CLA, CLED, CTD and LabVIEW Champion)


0 Kudos
Message 5 of 8
(11,805 Views)

Do you use dynamic events? If you register for them but don't handle them, don't they start eating memory?

0 Kudos
Message 6 of 8
(11,790 Views)

I am using dynamic events, but only one per event structure (with one event structure per component, total 5 components). And I handle them everywhere that I register them. Plus, I'd only heard that an unhandled registered dynamic event caused the event structure timeout to reset. Nothing about memory leaks!?

 

Looking at my code, the memory leak rate is related to the user event generation rate. Yikes...

 

Thoric (CLA, CLED, CTD and LabVIEW Champion)


0 Kudos
Message 7 of 8
(11,781 Views)

I have two other things that come to mind which aren't undebuggable, but were a bit tricky:

 

  1. A few years back I wrote a fairly low level piece of code which had a variant in a feedback node (Darin probably knows now where I'm going with this) and used its attributes as a lookup table. I knew the code would be called a lot from a lot of places, so when I started, I wrote some test code and stress-tested it to make sure that it wouldn't be a problem. When it was fine, I went ahead and wrote the entire rather complex program and then proceeded to test it, only to discover that its performance was abysmal (unfortunately, the program could only be meaningfully tested as a whole). It didn't take me too long to figure out that that piece of code was the culprit, but I couldn't figure out why, because I was sure I tested it and it performed OK.

    It took a bit of playing around to realize that there was a slight difference between my test code (which I think I no longer had at that point) and the actual code - the feedback node had an empty variant going into its init terminal, and apparently LV didn't like that and would run the code very slowly (nor did it like it if you used a select node to init the variant. It had to be a case structure). Once I changed that, the code worked exactly as expected. This issue was fixed around LV 2010.

  2. At some point, that same program received a significant upgrade, but after a week or two of running on-site, the client called and said that it's sometimes running slowly, and that if they restart the PC, it goes back to running OK. I couldn't figure out why, especially as it wasn't happening when I ran the program myself and they said it only happened occasionally. It took a few cycles of this to get them to show me the issue when it was actually happening, and I finally saw that the program had consumed way too much RAM.

    I couldn't find anything in the code which would do this, so I started by logging the RAM consumption and saw that it seemed to rise fairly consistently, but I still couldn't reproduce it, so eventually I just created a version of the EXE which neutralized parts of the code on demand to see if I could make it stop behaving badly, and that allowed me to find the culprit - hardware calls, which was one of the things which I didn't have when running in LV - when writing to an empty AO channel in a remote FieldPoint module, some memory would leak, which is obviously a bug in the FP code. Now, this also returned an error, but that part of the app was minor, so all it did was display the error in an array in a utility part of the program and it wasn't noticed, because the other AO channels (the ones which were acutally used) worked fine and this was introduced by some changes in the code which caused this new and unintended behavior. Once the problem was identified, the solution was simple - wrapping the FP AO VI in a case structure which would skip over it if the channel was blank.

___________________
Try to take over the world!
Message 8 of 8
(11,775 Views)