LabVIEW Embedded

cancel
Showing results for 
Search instead for 
Did you mean: 

Blackfin Performance Profiling

Hello,

 

I'm acquiring 32 bits signals in which only 18 bits are used.   As the acquisition rate is around 4.5 MHz and last a few seconds, I must compress my data.  Therefore I wrote a simple VI that takes an array of 4 32 bits data (where only 4*18 bits are usefull)  and convert it into an array of 9 U8 of data.  The conversion takes 0.4 us on my computer (Quad Xeon) but more than 60 us on the Blackin which make no sense.  I'm using the BF548 evaluation kit running at 600 MHz where 60 us equals almost 36000 operations...

 

I'm using the TickCount (ms) over multiple calls the conversion VI to know get my execution time. 

 

Is there any known issue with the TickCount (ms) function?   Is the evaluation kit expected to run slower for any reason?  I've attached the LabVIEW project to this message.

 

Regards,

 

Patrick Lessnick 

0 Kudos
Message 1 of 7
(7,274 Views)

Patrick,

 

I have looked through out known issues documentation and have been unable able to find a bug that would explain the behavior you are experiencing.  I am going to attempt to reproduce this behavior using the project and files you provided in your first post.   Before I can do that can you please provide me with the version of LabVIEW you are using and the version of the LabVIEW Embedded Module for Blackfin Processors you are using?

 

 

Thanks,

Mark

Mark
NI App Software R&D
0 Kudos
Message 2 of 7
(7,264 Views)

Hello Mark,

 

The customer is using LabVIEW 8.5 and ADI_Blackfin 2.5.0.  I'm looking foward with him too see if he can upgrade to 8.6 as we are just starting the project.

 

Thanks,

 

Patrick

0 Kudos
Message 3 of 7
(7,257 Views)

Plessnick,

 

I apologize for the delayed response.  I gave myself several unnecessary issues by have too many embedded development modules installed on one machine.  I have reproduce roughly the same behavior you have experienced, and I believe we can improve performance with an improved implementation of your BitShiftingLowLevel (SubVI).vi.  I have  modified this VI  by creating the array before inserting elements into the array and replaced the stacked sequence with a state machine.  This did not yield an improvement in performance. 

 

I am going to check with R&D to see if the length of time required to complete this operation is expected or if there is another way we can improve performance.

 

 

-Mark

Mark
NI App Software R&D
0 Kudos
Message 4 of 7
(7,227 Views)

Mark,

 

I've tried to process all the bits in parallel wihtout any noticable gain.  I also use the Blackfin bitshit VIs wihtout performance improvement.

 

I'm waiting for R&D answer.  I may have to do all the acquistion and compression procedure in an inline C node if we can't speed up the procedure.

 

Regards,

 

Patrick Lessnick

0 Kudos
Message 5 of 7
(7,225 Views)

Patrick,

 

From my discussions with R&D we are not experiencing a bug.  Its a unique challenge to streamline performance with LabVIEW for Blackfin since there are so many additional caveats we don't usually have to worry about in LabVIEW for Windows. 

 

 If performance is the chief concern, here are my recommendations.

  1. Use a shift register.
  2. Don't use a subVI. The code will be larger on the diagram, but it will be faster and smaller on the chip.
  3. Don't use build array in the low level part. Instead, Initialize an array outside of the loop and index into it.
  4. Turn on optimization. For example, turning on Disable parallel execution should have a significant effect.


If all else fails and our generated code just isn't fast enough (which I doubt), there is still the Inline C Node that he can use for doing something like this.

 

Given these recommendations please let me know if you have any additional questions or concerns.

 

 

-Mark

Mark
NI App Software R&D
0 Kudos
Message 6 of 7
(7,207 Views)

Mark,

 

I've updated my code with your recommandations and found out that the most important one is to remove the build array (even if the size is constant).  The execution time drops from 62 us to 13 us which is however 13 times slower than what I need.

 

At this time, I'm under the impression that I can go probably much faster using a an inline C node.  However, it still needs to be verified. 

 

Thanks,

 

Patrick Lessnick

0 Kudos
Message 7 of 7
(7,203 Views)