Archived: Specifying the Set of CPUs Available for Automatic Load Balancing in LabVIEW Real-Time

NI does not actively maintain this document.

This content provides support for older products and technology, so you may notice outdated links or obsolete information about operating systems or other relevant products.

Overview


Note: This document describes features included LabVIEW 8.5 or later versions of the Real-Time Module. As of the LabVIEW 8.6 Real-Time Module, the VIs install as a subpalette of the Real-Time palette, RT SMP CUO Utilities VIs. Refer to the LabVIEW Real-Time Module Help for your version of LabVIEW for further information. 

The LabVIEW Real-Time Module includes the NI RT Extensions for Symmetric Multiprocessing (SMP), which you can install to take advantage of multi-core systems with up to 32 CPUs. Installing the NI RT Extensions for SMP adds automatic load balancing features that distribute application threads across CPUs. This document discusses automatic load balancing concepts in LabVIEW Real-Time and introduces the SMP API, which you can use to specify the set of CPUs available for automatic load balancing and to reserve certain CPUs for exclusive use by specific Timed Structures. Refer to the LabVIEW Real-Time Module Help for an introduction to multi-core programming in LabVIEW Real-Time

Use the LabVIEW Real-Time Software Wizard in Measurement & Automation Explorer to install the NI RT Extensions for SMP. Refer to the Measurement & Automation Explorer Help for information about the using the Real-Time Software Wizard.

 

Note: NI will remove support for Phar Lap for cRIO in the NI 2020 Software Release and for PXI in the NI 2022 Software Release. For more information, please see the Phar Lap RT OS EOL Road Map.

Contents

CPU Pools

With the NI RT Extensions for SMP installed, the ETS real-time operating system (RTOS) maintains two pools of CPUs available for automatic load balancing: an OS pool and a Timed Structure (TS) pool. CPUs in the TS pool are available for automatic load balancing of Timed Structures (Timed Loops and Timed Sequences) configured for Automatic CPU Assignment. CPUs in the OS pool are used for all other processes, including low-level operating system tasks.

Automatic Load Balancing

Automatic load balancing is the process of assigning threads to CPUs in a way that balances the processing load across CPUs. At boot-up, the LabVIEW Real-Time Module assigns all available CPUs to both the OS pool and the TS pool, and therefore performs automatic load balancing across all CPUs. However, you can use the SMP API introduced in this document to specify arbitrary OS and TS pools.

LabVIEW Real-Time performs automatic load balancing on all threads, with the exception of manually-assigned Timed Structure threads. LabVIEW maps code execution from each Timed Structure to its own dedicated thread, but performs automatic load balancing only on Timed Structure threads configured for automatic CPU assignment. You can manually assign a Timed Structure to a particular CPU either by wiring the CPU index to the Assigned CPU input of the Timed Structure or by using the Processor Assignment section of the corresponding Configure Timed Structure dialog box. The default Assigned CPU value of -2 indicates that the Timed Structure is configured for automatic processor assignment via automatic load balancing.

Generally, the automatic load balancing process avoids switching a given thread between CPUs, to minimize inter-CPU data transfer overhead. However, the SMP scheduler might switch a thread repeatedly from one CPU to another if the load due to other threads of equal or higher priority running on each CPU fluctuates significantly. You can use the Real-Time Execution Trace Toolkit to determine whether a Timed Structure thread moves from CPU to CPU.

To ensure that the SMP scheduler does not unexpectedly assign Timed Structures to a particular CPU, you can remove that CPU from the TS Pool using the SMP API. However, to ensure that a Timed Structure has exclusive access to a manually-assigned CPU, you also must remove that CPU from the OS pool. By doing so, you can maximize the performance of a deterministic Timed Loop, as described in the Isolating a Timed Structure section of this document.

Pool Configurations

There are four possible states for a CPU in a LabVIEW Real-Time SMP system. A CPU can belong to:

  1. The TS Pool, in which case the SMP scheduler can use the CPU to execute Timed Structures configured for Automatic CPU Assignment.
  2. The OS Pool, in which case the SMP scheduler can use the CPU to execute threads that do not correspond to Timed Structures.
  3. Both pools, in which case the SMP scheduler can use the CPU for either of the above.
  4. Neither pool (reserved), in which case the CPU is dedicated solely to executing Timed Structures that you manually assign to that CPU.

When an RT target boots up, all CPUs are placed in both pools by default. 

Note:
Each pool must contain at least one CPU, although both pools can share a single CPU.

Isolating a Deterministic Timed Structure

To maximize performance and minimize jitter in a deterministic Timed Loop, you can isolate that Timed Structure on a particular CPU. To isolate a Timed Structure, you must assign the Timed Structure to a CPU that is not contained in either the OS pool or the TS pool. When a CPU is assigned to neither pool, it is reserved to run only the Timed Structure(s) that you manually assign to that CPU. Isolating a Timed Structure on a single CPU allows you to monopolize the processing capacity of that CPU and achieve high frequency or throughput rates. For example, you can take advantage of isolation to achieve a high polling rate in a Timed Loop that performs data acquisition.

To minimize the latency of a high-performance, deterministic Timed Loop, consider isolating that Timed Loop on a high-index CPU. For example, if the system contains four CPU cores indexed 0-3, consider isolating the deterministic Timed Loop on CPU 3. When scheduling Timed Structures, the real-time operating system begins at the highest-index CPU and increments down. Therefore, if multiple Timed Structures of equal priority are scheduled to wake up at the same time, the wake-up latency of Timed Structures executing on lower-index CPUs can be higher than that of Timed Structures executing on higher-index cores by up to several microseconds. The reverse is true for non-Timed-Loop threads, because for these threads the scheduler starts at CPU 0 and increments up. Therefore, to minimize latency, NI recommends assigning lower-numbered CPUs to the OS pool and reserving higher-numbered CPUs for deterministic Timed Loops.

Preventing OS Thread Starvation

To prevent starvation of OS threads, consider reserving at least one CPU for OS threads only. For example, if you assign CPU 0 to the OS pool but not to the TS pool, and you avoid targeting any Timed Structures to CPU 0, then CPU 0 is always available to run OS threads.

Maximizing CPU Utilization

To maximize processor utilization, it helps to determine the number of CPUs in each pool based on the proportion of load that corresponds to TS threads versus non-TS threads. You can use the Get Core Loads VI to estimate the proportion of total processing time dedicated to TS and non-TS threads and assign CPUs to the TS and OS pools accordingly. If the RT target is connected to a monitor, you also can use the CPU Load Measurement utility to estimate load distribution.

Avoiding Partial Pool Overlap

If you define the OS and TS pools such that the two pools partially overlap, the automatic load balancing process might not make optimal processor-utilization decisions, as illustrated in the following examples.

Example 1

Assume you have an application with three Timed Loops. The periods of the Timed Loops are configured such that most of the time, only one of the three Timed Loops is running. However, occasionally the periods of the Timed Loops align, and you want to ensure that when this happens all three Timed Loops can execute in parallel. You might be tempted to assign three CPUs (e.g. CPUs 0-2) of a quad-core system to the OS pool and three CPUs (e.g. CPUs 1-3) to the TS pool, assuming that CPU 3 would always run one of the three Timed Loops while CPUs 1 and 2 would generally run OS threads and would run Timed Loops only when the periods align. However, because the SMP scheduler attempts to run each thread on the same CPU from iteration to iteration, all three Timed Loops might execute on CPU 1 or CPU 2, thus reducing the processing capacity available for OS threads and leaving CPU 3 idle. In this case, a better solution would be to manually assign each Timed Loop to a different CPU, and to assign all four cores to the OS pool.

Example 2

Assume you have an application with two Timed Loops that perform continuous polling and two Timed Loops that run intermittently. You might be tempted to assign all four CPUs of a quad-core system to the TS pool and two of the CPUs, e.g. CPUs 2 and 3, to the OS pool. You might assume that the two polling Timed Loops would execute on CPUs 0 and 1, and that the intermittent Timed Loops would share processing time on CPUs 2 and 3. However, the SMP scheduler might assign the polling Timed Loops to CPUs 2 and 3, thus starving the other OS threads and leaving CPUs 0 and 1 under-utilized. In this case, the best solution would be to isolate the polling Timed Loops on two CPUs and assign the remaining two CPUs to both pools.

SMP CPU Pool Utilities API

Note: The SMP CPU Pool Utilities API is supported only on multi-CPU RT targets running the ETS Phar Lap operating system with the NI RT Extensions for SMP installed.

Get CPU Loads.vi

Monitors the distribution of load across the CPUs in the system. For each CPU in the system, this VI returns the total load as a percentage of capacity, as well as the percentage of total capacity devoted to interrupt service routines (ISRs), Timed Structures, and all other threads. The Nth element of each array corresponds to the Nth CPU in the system.

Get Number of CPUs.vi

Returns the number of CPUs in the system.

Set CPU Pool Sizes.vi

Sets the number of CPUs in each pool. You can use this VI to define the OS and TS pools by specifying the number of CPUs you want each pool to contain. This VI creates the OS and TS pools as adjacent pools of contiguous CPUs. The OS pool begins at CPU 0 and the TS pool begins where the OS pool ends.

For example, on an eight-CPU system, if you wire a value of 3 to both the OS Pool and TS Pool controls, this VI assigns CPUs 0-2 to the OS pool, CPUs 3-5 to the TS Pool, and leaves CPUs 6-7 reserved for use by Timed Structures configured for manual CPU assignment.

Note: You cannot use this VI to create empty pools or partially overlapping pools. This VI returns an error if you specify an OS pool or TS pool size of 0 or if the OS Pool and TS Pool values you specify add up to more than the number of CPUs available in the system.

Specifying a value of -1 for either pool indicates no size preference for that pool. If you specify a value of -1 for both pools, this VI creates the default pool configuration in which both pools contain every CPU in the system. If you specify a pool size of -1 for one pool, this VI assigns all remaining CPUs to that pool. For example, on a four-CPU system, if you specify a pool size of -1 for the OS pool and a pool size of 2 for the TS pool, this VI assigns CPUs 0 and 1 to the OS pool and CPUs 2 and 3 to the TS pool.

Set CPU Pool Assignments.vi

Assigns CPUs to one of four possible states: OS pool only, TS pool only, both pools, or no pool (reserved). This VI outputs the bit masks that specify the CPUs assigned to each pool. On an N-CPU system, the bits of the bit mask correspond to CPUs 0 through N-1. The right-most bit of each bit mask corresponds to CPU 0 and the left-most bit corresponds to CPU 31 (if such a CPU exists in the system).

The input to this VI is an array of enums. The enum contains the four possible states of a CPU and each element of the array represents a CPU. For example, assuming an eight-CPU system, the array of enums in the figure below assigns CPUs 0, 1, 2 to the OS Pool, CPUs 3 and 6 to be in the TS Pool, and reserves the remaining CPUs.

Set OS Pool.vi

Specifies the set of CPUs contained in the OS pool. The OS mask input is a bit mask in which the right-most bit corresponds to CPU 0 and the left-most bit corresponds to CPU 31. To assign a CPU core to the OS pool, set the value of the corresponding bit to 1. To specify that the OS pool should not contain a particular CPU, set the value of the corresponding bit to 0.

Note: The OS pool must contain at least one valid CPU.

Set TS Pool.vi

Specifies the set of CPUs contained in the TS pool. The TS mask input is a bit mask in which the right-most bit corresponds to CPU 0 and the left-most bit corresponds to CPU 31. To assign a CPU to the TS pool, set the value of the corresponding bit to 1. To specify that the TS pool should not contain a particular CPU, set the value of the corresponding bit to 0.

Note: The TS pool must contain at least one valid CPU.

Overview of SMP CPU Pool Utilities API

The SMP CPU Utilities API provides a set of VIs that you can use to monitor CPU load and to assign CPUs to the TS and OS pools for automatic load balancing. The API provides three levels of abstraction, so you can choose the level of abstraction that best matches your programming style and application requirements. The high-level VI, Set CPU Pool Sizes.vi, is built from the mid-level VI, Set CPU Pool Assignments.vi, which is built from the low-level advanced VIs, Set OS Pool.vi and Set TS Pool.vi. The high-level VI, Set CPU Pool Sizes.vi, is a good choice for most use cases. You can use the Set CPU Pool Sizes VI to create either the default pool configuration in which both pools contain all CPUs or to create contiguous, adjacent pools.

For custom CPU pool configurations, the mid-level VI, Set CPU Pool Assignments.vi, is a good choice. You can use the Set CPU Pool Assignments VI to create any pool configuration as long as each pool contains at least one CPU. If you prefer to work with bit masks, you can use the low-level VIs on the Advanced subpalette. For example, the figure below uses the low-level VIs, set OS Pool.vi and Set TS Pool.vi, to achieve the same pool configuration as the array of enums shown in the example above.

Conclusion

The LabVIEW Real-Time Module uses automatic load balancing to distribute threads across two pools of CPUs. CPUs in the OS pool execute all threads other than those that correspond to Timed Structures. CPUs in the Timed Structure (TS) pool execute threads that correspond to Timed Structures configured for automatic CPU assignment. By default, LabVIEW Real-Time assigns all CPUs to both pools. However, you can use the SMP CPU Pool Utilities API to define the OS and TS pools in a way that meets your specific application requirements.

Refer to the LabVIEW Real-Time Help for an introduction to SMP programming in LabVIEW Real-Time

Was this information helpful?

Yes

No