Specifying the Set of CPUs Available for Automatic Load Balancing
- Updated2025-02-17
- 7 minute(s) read
Specifying the Set of CPUs Available for Automatic Load Balancing
Installing the NI RT Extensions for Symmetric Multiprocessing (SMP) on a multi-CPU RT target adds automatic load balancing features that distribute application threads across CPUs. Use the LabVIEW Real-Time Software Wizard in Measurement & Automation Explorer to install the NI RT Extensions for SMP. Refer to the Measurement & Automation Explorer Help for information about the using the Real-Time Software Wizard.
Use the RT SMP CPU Utilities VIs to specify the set of CPUs available for automatic load balancing and to reserve certain CPUs for exclusive use by specific Timed Structures.
CPU Pools
With the NI RT Extensions for SMP installed, the real-time operating system (RTOS) maintains two pools of CPUs available for automatic load balancing: a Timed Structures pool and a System pool. CPUs in the Timed Structures pool are available for automatic load balancing of Timed Loops and Timed Sequences configured for automatic processor assignment. CPUs in the System pool are used for all other processes, including low-level operating system tasks.
Automatic Load Balancing
Automatic load balancing is the process of assigning threads to CPUs in a way that balances the processing load across CPUs. At startup, the LabVIEW Real-Time Module assigns all available CPUs to both the System pool and the Timed Structures pool, and therefore performs automatic load balancing across all CPUs. However, you can use the SMP CPU Utilities VIs to specify arbitrary System and Timed Structures pools.
LabVIEW Real-Time performs automatic load balancing on all threads, with the exception of manually-assigned Timed Structure threads. LabVIEW maps code execution from each Timed Structure to its own dedicated thread, but performs automatic load balancing only on Timed Structure threads configured for automatic CPU assignment. You can manually assign a Timed Structure to a particular CPU by wiring the CPU index to the Processor input of the Timed Structure. The default Processor value of -2 configures the Timed Structure for automatic processor assignment through automatic load balancing. You also can manually assign a Timed Structure to a processor by using the Processor Assignment section of the Configure Timed Loop dialog box or the Configure Timed Sequence dialog box.
Generally, the automatic load balancing process avoids switching a given thread between CPUs to minimize inter-CPU data transfer overhead. However, the SMP scheduler might switch a thread repeatedly from one CPU to another if the load due to other threads of equal or higher priority running on each CPU fluctuates significantly. You can use the Real-Time Trace Viewer to determine whether a Timed Structure thread moves from CPU to CPU.
To ensure that the SMP scheduler does not unexpectedly assign Timed Structures to a particular CPU, you can remove that CPU from the Timed Structures pool using the SMP CPU Utilities VIs. However, to ensure that a Timed Structure has exclusive access to a manually-assigned CPU, you also must remove that CPU from the System pool. By doing so, you can maximize the performance of a deterministic Timed Loop, as described in the Isolating a Deterministic Timed Structure section of this topic.
Pool Configurations
There are four possible states for a CPU in an SMP-enabled RT target. A CPU can belong to:
- The Timed Structures Pool, in which case the SMP scheduler can use the CPU to execute Timed Structures configured for automatic processor assignment.
- The System pool, in which case the SMP scheduler can use the CPU to execute threads that do not correspond to Timed Structures.
- Both pools, in which case the SMP scheduler can use the CPU for either of the above.
- Neither pool (reserved), in which case the CPU is dedicated solely to executing Timed Structures that you manually assign to that CPU.
When an SMP-enabled RT target starts up, LabVIEW assigns all CPUs to both pools by default. However, you can use the SMP CPU Utilities VIs to assign each CPU to any of the four states listed above.
Isolating a Deterministic Timed Structure
To maximize performance and minimize jitter in a deterministic Timed Loop, you can isolate that Timed Structure on a particular CPU. To isolate a Timed Structure, you must assign the Timed Structure to a CPU that is not contained in either the System pool or the Timed Structures pool. When a CPU is not assigned to a pool, the CPU is reserved to run only the Timed Structure(s) that you manually assign to that CPU. Isolating a Timed Structure on a single CPU allows you to monopolize the processing capacity of that CPU and achieve high frequency or throughput rates. For example, you can take advantage of isolation to achieve a high polling rate in a Timed Loop that performs data acquisition.
To minimize the latency of a high-performance, deterministic Timed Loop, consider isolating that Timed Loop on a high-index CPU. For example, if the system contains four CPU cores indexed 0-3, consider isolating the deterministic Timed Loop on CPU 3. When scheduling Timed Structures, the real-time operating system begins at the highest-index CPU and increments down. Therefore, if multiple Timed Structures of equal priority are scheduled to wake up at the same time, the wake-up latency of Timed Structures executing on lower-index CPUs can be higher than that of Timed Structures executing on higher-index cores by up to several microseconds. The reverse is true for non-Timed-Loop threads, because for these threads the scheduler starts at CPU 0 and increments up. Therefore, to minimize latency, NI recommends assigning lower-numbered CPUs to the System pool and reserving higher-numbered CPUs for deterministic Timed Structures.
Preventing Thread Starvation
To prevent starvation of System threads, consider reserving at least one CPU for System threads only. For example, if you assign CPU 0 to the System pool but not to the Timed Structures pool, and you avoid targeting any Timed Structures to CPU 0, CPU 0 is always available to run System threads.
Maximizing CPU Utilization
To maximize processor utilization, it helps to determine the number of CPUs in each pool based on the proportion of load that corresponds to Timed Structure threads versus System threads. You can use the RT Get CPU Loads VI to estimate the proportion of total processing time dedicated to Timed Structures and System threads and assign CPUs to the Timed Structures and System pools accordingly. If the RT target is connected to a monitor, you also can use the on-screen CPU Load Measurement utility to estimate load distribution.
Avoiding Partial Pool Overlap
If you define the System and Timed Structures pools such that the two pools partially overlap, the automatic load balancing process might not make optimal processor-utilization decisions, as illustrated in the following examples:
Example 1
Assume you have an application with three Timed Loops. The periods of the Timed Loops are configured such that most of the time only one of the three Timed Loops is running. However, occasionally the periods of the Timed Loops align, and you want to ensure that when this happens all three Timed Loops can execute in parallel. You might be tempted to assign three CPUs (such as CPUs 0-2) of a quad-core system to the System pool and three CPUs (such as CPUs 1-3) to the Timed Structures pool, assuming that CPU 3 would always run one of the three Timed Loops while CPUs 1 and 2 would generally run System threads but would each run a Timed Loop when the periods align. However, because the SMP scheduler starts timed structures at the highest-index CPU and attempts to run each thread on the same CPU from iteration to iteration, the first two Timed Loops might both execute on CPU 3 and the third on CPU 2. The first time the periods align, one of the Timed Loops would transfer to CPU 1, causing a jitter spike. In this case, a better solution would be to manually assign each Timed Loop to a different CPU and to assign all four CPUs to the System pool.
Example 2
Assume you have an application with two Timed Loops that perform continuous polling and two Timed Loops that run intermittently. You might be tempted to assign all four CPUs of a quad-core system to the Timed Structures pool and two of the CPUs, such as CPUs 2 and 3, to the System pool. You might assume that the two polling Timed Loops would execute on CPUs 0 and 1, and that the intermittent Timed Loops would share processing time on CPUs 2 and 3. However, the SMP scheduler might assign the polling Timed Loops to CPUs 2 and 3, starving the other System threads and leaving CPUs 0 and 1 under-utilized. In this case, the best solution would be to isolate the polling Timed Loops on two CPUs and assign the remaining two CPUs to both pools.
Using the SMP CPU Utilities VIs
Use the SMP CPU Utilities VIs to define the System and Timed Structures pools. The SMP CPU Utilities palette includes three VIs that offer three different ways to define the CPU pools, so you can choose the VI that best matches your programming style and application requirements. The RT Set CPU Pool Sizes VI is built on top of the RT Set CPU Pool Assignments VI, which is built on top of the RT Set CPU Pool VI. The RT Set CPU Pool Sizes VI is a good choice for most use cases. You can use the RT Set CPU Pool Sizes VI to create either the default pool configuration in which both pools contain all CPUs or to create contiguous, adjacent pools. For custom CPU pool configurations, the RT Set CPU Pool Assignments VI is a good choice. You can use the RT Set CPU Pool Assignments VI to create any pool configuration as long as each pool contains at least one CPU. If you prefer to work with bit masks, you can use the RT Set CPU Pool VI.