Embedded Networks Performance Analysis

Publish Date: Dec 07, 2009 | 1 Ratings | 5.00 out of 5 | Print | Submit your review

Overview

This article describes different ways to measure and test the performance of CAN interfaces, with focus on NI-XNET CAN devices and their improvements against legacy NI-CAN interfaces. For each performance measurement, the document provides: A description of the measurement and why it is important in an actual application, a description of the test VI used, and test results. Since these tests results may differ for each test system, we have also provided a suite of VIs that you can use to perform your own benchmarking. This document is NOT a getting started document. The attached tests are not trivial and should not be used to get started with NI-XNET. For more information about CAN and NI-XNET, please follow the links at the bottom of this page.

Table of Contents

  1. How do we measure performance?
  2. Cost (CPU Utilization)
  3. Throughput
  4. Message Latency
  5. Addition Links

1. How do we measure performance?

Performance can be measured differently depending on different applications.  If the end goal is to download a new firmware onto an ECU using a CAN port, then performance might be measured as the number of frames per second an interface can output (or write).  A faster output rate means a faster download time, which also means better performance.  Performance will be measured in a totally different way if the application requires testing the behavior of an ECU in a simulated environment (Hardware in the Loop).  Performance measurements are highly application dependant.  We narrow down performance analysis of NI-XNET devices in three categories:

Cost (CPU Utilization): How Long do NI-XNET calls take to execute?

Throughput: How many frames per second (or bus load) can my application handle?

Latency: What is the delay from my application to the bus (receive to read, and write to transmit)?

Back to Top

2. Cost (CPU Utilization)

The first and often most important type of performance measurement involve calculating the CPU use for NI-XNET driver calls.  The way we perform this is by calculation how long it takes to read and write single point signal values.  The reason why we measure signal input and output time instead of frame input and output is that typical applications usually use signals since the driver can take care of converting the raw frame values instead of having the user program this conversion.  Using signal IO, we include the conversion time in our benchmarks.  For more information on signals and frames data types, please see this CAN Overview.

The reason why this measurement is the most important is that the faster the read and writes are, the more time the application has for performing other tasks (like model calculations for an HIL system or data acquisition for example). This is displayed in the image below.

Figure 1: Function Cost Example

 

For a fixed loop rate, By minimizing ∆t1 and ∆t3 the processor has more time for other tasks.  A faster ∆t1 and ∆t3 also means we can have a higher loop rate (or smaller loop period), which can be critical for certain applications.  NI-XNET hardware and software architecture has been optimized for these kinds of applications by using device driven DMA technology.

Test Description: Cost

To perform this test, we will create an NI-XNET session containing many frames, each containing a number of signals.  To make the test more versatile, we create the frames in memory when we run the application (in contrast to having a fixed database file).  Inside the loop, we can measure the current interface timestamp using the NI-XNET Read -> State -> Timestamp Current.  This function uses the onboard clock to calculate the current time, which gives a micro-second resolution.  We then write all our signal values using a single point output session.  We read the timestamp after all the signals are written and compare it to the timestamp we got at the beginning.  This will give us ∆t1 from the diagram above.  We do the same for read.  In the test VI, we also use a queue (or RT FIFOs) to buffer the data to another loop for logging, which helps us maintain a high loop rate.  Also note that we can use the native LabVIEW get time to get the current operating system time.  This is useful in LabVIEW RT because this function which also gives us micro-seconds resolution.     

Test Results: Cost

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM) 

CAN Interface

NI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Signal Single Point Performance RT.vi

Number of Frames

64

Number of Signals per frame

10

Total number of signals

1280

Number of iterations

200000

 

Write (200 signals)

Read (200 signals

Average Speed

48.58us (0.2429us /sig)

33.42us(0.167us/sig)

Minimum

46.73us

32.42us

Maximum

65.33us

50.07us

Standard Deviation

1.01

0.49

Variance

1.01

0.24

Operating System

LabVIEW RT

Controller

NI PXI-8108 (with 2GB of RAM)

CAN Interface

NI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Signal Single Point Performance RT.vi

Number of Frames

20

Number of Signals per frame

10

Total number of signals

400

Number of iterations

233 340 826 (Weekend Test)

 

Write (200 signals)

Read (200 signals)

Average Speed

48.0559us (us/sig)

33.4289us(us/sig)

Maximum

62.4657us

41.9617us

 

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

NI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Signal Single Point Performance RT.vi

Number of Frames

64

Number of Signals per frame

10

Total number of signals

1280

Number of iterations

200000

 

Write (640 signals)

Read (640 signals)

Average Speed

180.29us (0.282us/sig)

72.96us(0.114us/sig)

Minimum

175.48us

70.57us

Maximum

222.21us

132.1us

Standard Deviation

2.37

0.56

Variance

5.61

0.31

Operating System

Windows XP

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

NI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Signal Single Point Performance Windows.vi

Number of Frames

20

Number of Signals per frame

10

Total number of signals

400

Number of iterations

200000

 

Write

Read

Average Speed

319.5us

43.4 us

Minimum

154.7us

40.6us

Maximum

3165.8us

2435.3us

Standard Deviation

180.7

3.3

Variance

32654.16

10.86

Frequency of occurrence of Read Time

This graph shows that most of the read times are around 43us.  However, there are still read times that take much longer.  We can look at specific parts of this graph in more detail to see the distributions.

Frequency of occurrence from 0 to 115us

Frequency of occurrence from 115 to 500us

Frequency of occurrence from 500 to 2600us

The test results really demonstrate how the NI-XNET driver is optimized for use with LabVIEW Real Time.  Also, the results demonstrate exactly the importance of using a real time operating system in order to have deterministic performance throughout the run time of the application.  Using a conventional operating system like Windows XP, we saw random delays of up to 3ms for writing and almost 2.5ms for reading.  Also, please note that these tests were done with an NI-8108 controller.  Tests performed on your system (even of comparable performance) may be entirely different depending on which software you have installed and running at any point in time.  For more information on LabVIEW RT, please see this link.

The same type of test was performed on Legacy NI-CAN hardware.

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

NI PXI-8464/2 Legacy

VI Used

NI-CAN Signal Single Point Performance RT.vi

Number of Frames

20

Number of Signals per frame

10

Total number of signals

400

Number of iterations

173000

 

Write

Read

Average Speed

1527.42us

1094.07 us

Minimum

1307.49us

599.384us

Maximum

1990.32us

2084.26us

Standard Deviation

41.01

147.53

Variance

1682.09

21765

The same NI-CAN code ran on NI-XNET hardware using the NI-CAN compatibility layer.  This enables re-use of NI-CAN code already developed while getting the performance improvement of NI-XNET. 

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

NI PXI-8513/2 (in HS mode) 

VI Used

NI-CAN Signal Single Point Performance RT.vi

(Using Compatibility layer)

Number of Frames

20

Number of Signals per frame

10

Total number of signals

400

Number of iterations

100000

 

Write

Read

Average Speed

261.83us

 237.40us

Minimum

257us

234.604us

Maximum

56809.9us*

247.955us

Standard Deviation

178.82

0.52

Variance

31978.01

0.27

*This maximum time is due to the first write taking very long.  This is due to the compatibility library having to perform additional work during the first write call.  The next maximum time for write is 264.645us.  In this case, at least one “warmup” iteration is required to keep the jitter down.

Frequency of occurrence of Write and Read time with warmup iteration

   

As you can see by these results, the performance of using the NI-CAN compatibility layer does not match the NI-XNET API.  However, it does greatly improve the performance over Legacy NI-CAN interfaces with no code change!

Back to Top

3. Throughput

Working with Frames

Throughput is a measure of the average rate of successful data transfers for an interface.  This can be measured in frames/second for CAN. We can also refer to the term “busload”.  However, it is important to realize the difference between both terms.

To calculate the number of frames per second, we need to know how long a bit is.  In CAN, this is defined by the baudrate.  For a 125 000 bit/sec baudrate, each bit takes 8us on the bus.  Once we know how much time each bit takes on the bus, we need to know how long each frame is in bits.  For example, a 100 bit frame at 125 Kb/s takes 800us to be transmitted on the CAN bus.  This means that the maximum number of frames per second with this configuration is 1250.  In contrast, if we use a 1Mb/s baudrate (1us bit time), we can have 10 000 frames per second on the bus.   

This gets a little more complicated, since a CAN a frame can have a variable length (in bits). The length depends on the number of data bytes and whether it uses a standard or extended ID.  Also, the CAN transceiver will insert stuff bits if there are 5 consecutive bits of the same polarity, so the length also depends on what data and ID we are using.  In the attached suite of VIs, we have provided Utility VIs that can help in calculating the frame length in bits and the possible frames per second for a given baudrate/ID type/data length.  In the example below, we use a VI to calculate the number of bits for a specific frame.  You can see that the frame length changes depending on the data being sent because of the stuff bits that have to be inserted. 

The picture below shows the approximation that can be made depending on certain parameters.

The other term often used to measure throughput performance is “busload”.  In contrast to “frames per second”, the busload measurement abstracts the frame length used, or the baudrate.  A 100% bus load means that each bit is being used on the bus.  For a 1 000 000 bit per second baudrate, this would mean that the CAN transceivers are continuously transmitting 1 million bits on the bus every second.  This can translate to about 21276 0 bytes frames every second (using standard IDs and no stuff bits) or, 7633 8 bytes frames per second (using extended IDs and many stuff bits).  Note that although each of these examples represent a 100% bus load scenario, the “work” that has to be done by the CAN interface is very different.  In the first case, the interface must process (read or write) 21276 frames every second, whereas in the second case, the interface has to process 7633 every second.

For our tests, we will refer to % bus load.  Also, since we have complete control over the frames being sent, we can calculate the length of each frame sent.  This enables us to also show an approximate frame per second value.  This would be very processor intensive to do in an actual application where the frame length is continuously changing. 

Test Description: Receiving Frames

The first test that was performed was to validate that the NI-XNET interfaces could read a “100% bus load”.  We can create a simple VI that uses 1 byte frames and standard arbitration IDs.  The worst case for “frames per second” performance measure would be 0 byte frames, but we want to validate the data being sent.  We need a 1 byte frame because we increment this byte after every transmission.  On the receiving end, we perform data verification to make sure the received frame’s byte has been incremented properly.  This is a simple check to make sure we are receiving every frame.

Test Results: Receiving Frames

Operating System

Windows XP

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12xNI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total. 

VI Used

NI-XNET Fast Transmit with Increment.vi (on port 32)

NI-XNET Receive Test with Data Check.vi (on port 1 to 31)

Frames Transmitted

4.06075E+7(~14.5 hours)

 

Operating System

LabVIEW RT

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12xNI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total

VI Used

NI-XNET Fast Transmit with Increment.vi (on another system)

NI-XNET Receive Test with Data Check.vi (on port 1 to 31)

Frames Transmitted

5.53379E+7 (~19 hours)

Test Description: 100% Bus Load Generation using Signals

In order to test if we can read 100% bus load, or a very high number of frames per second, we also need to be able to generate 100% bus load.  As seen in the previous section, this can easily be done on a single port with NI-XNET.  However, there are different ways to generate frames on the bus.  The interface can be configured to send a number of periodic frames.  The number of frames and periods will dictate the % bus load. For example, the interface could be configured with 10 frames (100 bit length for simplicity, so 100us for 1Mb/s) with a period of 1000us each.  This would create a 100% bus load.    

We have also provided a VI that can generate a user defined bus load.  This VI creates frame objects in memory and adjusts the frames period to create a specific “bus load”.  This transfers much of the processing to the NI-XNET firmware running on the PCI/PXI board (we can just update the value sent periodically).   The NI-XNET hardware takes care of sending the periodic frames, making it easy to create a high bus load.  This type of “random high bus load generation” can be used where we want to see if an ECU can still communicate properly on a high bus load.  This VI was modified to generate a “random” 100% bus load on multiple ports using 8 bytes frames.  

(Since this test is slightly more complex, please refer to the block diagram for more details.)

Test Results: 100% Bus Load Generation using Signals

This VI was modified to generate a “random” 100% bus load on multiple ports using 8 bytes frames.

Operating System

Windows XP

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12xNI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total

VI Used

NI-XNET Bus Loading Test Multiple Ports.vi

(with NI-XNET Bus Monitor)

Test Status

~100% Bus load generated on all ports

~16500 frames per second

Run Time

261697.976 seconds or 72.69 hours

 

Operating System

LabVIEW RT

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12xNI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total

VI Used

NI-XNET Bus Loading Test Multiple Ports.vi

 (with NI-XNET Bus Monitor)

Test Status

~100% Bus load generated on all ports

Average processor load

CPU1: 74.65% CPU2:69.97%

Run Time

85476 seconds or 23.74 hours

Test Description: 100% Bus Load Generation streaming frames

The second way we can generate many frames on the bus is using “streaming”.  This does not use any periodic object.  When using streaming, we buffer many frames in a queue and tell the hardware to send the frames one by one, in a specific order.  One example for this is flashing multiple ECUs in parallel.  Flashing an ECU can often be done at a very high frame rate.  The faster the frame rate, the faster you can flash your ECU.  For this application, the Streaming Session Mode is more appropriate since we have a large number of specific frames we want to send in a specific order.

(Since this test is slightly more complex, please refer to the block diagram for more details.)

Test Results: 100% Bus Load Generation streaming frames

Operating System

Windows XP

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12NI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total

VI Used

NI-XNET Frame Throughput Multiple Ports.vi

Bus Load maintained on all ports

~97%

Run Time

23 Hours (1376864682 Frames)

 

Operating System

LabVIEW RT

Controller

NI-PXI-8108  (with 2GB of RAM)

CAN Interface

12xNI PXI-8513/2 (in HS mode)

4 x NI PXI-8512/HS 2 Port

32 Ports total

VI Used

NI-XNET Frame Throughput Multiple Ports.vi

Bus Load maintained on all ports

~97%

Run Time of

20 hours (971366613 frames)

Operating System

LabVIEW RT

Back to Top

4. Message Latency

Message Latency measures how long it takes for the embedded network interface to bring a message up to the API since it was actually present on the bus.  Please note that this is not usually an important factor for developing CAN applications.  The only time this type of performance measurement becomes important is when your application is expecting an immediate reply from the device you are communicating with.  In this case, you want your message to go out on the bus as fast as possible and you want the response to come back to your API as fast as possible as well.  Examples of this are the CCP protocol, where an application sends requests to an ECU.

Please note: For most applications, the read and write time for signal IO (Cost) is the most important benchmark.  Typical applications involve writing many (>100) signal values.  The better an interface can cope with this type of data, the more efficient an application will be.

The diagram above shows a CAN interface writing and reading a frame on the bus.  Here are a few details we can see:

Write Delay = T1 – T0

Bus Transmission delay = T2 – T1

Read Delay = T3 – T2

The bus transmission delay can usually be calculated (approximately) if we know the exact frame that will be sent.  If we take this into account, then:

Write and Read Latency = (T3 – T0) – (T2-T1)

Test Description: Message Latency

We can perform the “Write Delay” measurement (T1-T0) with NI-XNET interfaces.  First, it is easier if we use the same interface to read and write the frame because both port’s timestamps are derived from the same onboard oscillator.  Therefore, there is no clock drift between the timestamps.  We will use one port to transmit a frame, and the other port to read.  We can measure the timestamp right before the frame is sent and the timestamp right after the frame is received.  This will give us the time it took for the API to send the frame to the interface, the time for the interface to transmit the frame on the bus, the delay for the frame to be written on the bus, the delay from the interface reading the frame and sending it back up to the API.

Test Results: Message Latency

Operating System

Windows XP

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

1xNI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Frame Turnaround Test.vi

Number of iterations

200000

Session Mode

Stream

Queued

Single Point

Average Turnaround Time

73.09us

76.81us

67.84us

Variance

17.09

31.41

32.45

Standard Deviation

4.13

5.60

5.70

Minimum

63us

62.8us

52.1

Maximum

103.5us

108.2us

98.4us

Frequency of occurrence Latency time using Stream, Queued and Single Point session modes.

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

1xNI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Frame Turnaround Test.vi

Number of iterations

200000

Session Mode

Stream

Queued

Single Point

Average Turnaround Time

64.29us

68.74us

60.26us

Variance

13.44

27.66

29.05

Standard Deviation

3.67

5.26

5.39

Minimum

56.231us

56.138us

47.555us

Maximum

91.609us

97.808us

88.748us

Frequency of occurrence Latency time using Stream, Queued and Single Point session modes.

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

1xNI PXI-8513/2 (in HS mode)

VI Used

NI-XNET Frame Turnaround Test.vi

Number of iterations

373092988 (Overnight Test)

Session Mode

Stream

Queued

Single Point

Average Turnaround Time

61.2647us

65.7454us

53.9423us

Maximum

92.5631us

99.7157us

90.6558us

 

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

NI PXI-8464/2 Legacy

VI Used

NI-CAN Frame Turnaround Test.vi

This test uses legacy NI-CAN hardware and NI-CAN driver software.

Number of iterations

100000

Session Mode

Stream

Single Point

Average Turnaround Time

434.69us

435us

Variance

284.44

294.42

Standard Deviation

16.87

17.16

Minimum

362.758us

371.295us

Maximum

515.915us

513.531us

Frequency of occurrence Latency time using Stream and Single Point session modes.

Operating System

LabVIEW RT

Controller

NI PXI-8108  (with 2GB of RAM)

CAN Interface

1xNI PXI-8513/2 (in HS mode) 

VI Used

NI-CAN Frame Turnaround Test.vi

This test runs NI-CAN code on NI-XNET hardware using the NI-CAN compatibility layer.

Number of iterations

100000

Session Mode

Stream

Single Point

Average Turnaround Time

60.95us

61.02us

Variance

15.37

15.17

Standard Deviation

3.92

3.89

Minimum

51.522us

51.86us

Maximum

88.239us

87.239us

Frequency of occurrence Latency time using Stream and Single Point session modes.

Back to Top

5. Addition Links

NI-XNET CAN and FlexRay Platform Overview

Migrating NI-CAN Applications to NI-XNET

Back to Top

Bookmark & Share


Downloads

Attachments:

test_suite_1-0-0.zip


Ratings

Rate this document

Answered Your Question?
Yes No

Submit