With SMP, the multiple processors (or cores) in a system are assigned to certain “pools.” Once a processor is assigned to a pool, you can assign threads to execute on specific processors in a pool. Therefore, you can configure one pool to handle OS threads while another pool handles your application threads. In this way, SMP gives you complete control over load distribution on a multiprocessor system. Ideally, an OS that implements an SMP model not only provides the flexibility to assign threads to specific cores but also has built-in logic to load balance threads across multiple cores.
To take full advantage of the additional processing power of a multicore system, the software must be written such that the execution can be split among the available processing entities. According to Amdahl’s Law, the increase in application performance when using a multicore processor depends on the inherent parallelism within the software. Therefore, if your application has many parts that must execute serially, the increase in performance from a multicore system is minimal. However, an application that is architected to be inherently multithreaded sees a significant performance increase from a multicore system.
2. Auto-Load Balancing
In a multicore system, the responsibility for distributing threads between available cores lies with the OS. By default, standard operating systems implement auto-load balancing in multicore systems. This means that the thread scheduler within the OS automatically assigns the next thread to be executed to the most appropriate core based on various factors including processor utilization. By allowing an OS to auto-load balance, you can ensure that your application achieves the best execution time from a multicore processor.
With an RTOS, auto-load balancing depends on thread priorities to allow time-critical threads to interrupt lower-priority threads. The RTOS handles decisions about which core to use when performing round-robin scheduling of threads with the same priority. Multithreading and multitasking support in the RTOS provides the basis for taking advantage of multicore performance in real time. The LabWindows/CVI 8.5 Real-Time Module includes the NI RT Extensions for SMP, which adds multicore support to the ETS real-time operating system.
Figure 1. With LabWindows/CVI Real-Time support for SMP, you can achieve automatic load balancing across multiple cores.
3. Processor Affinity on Windows and Real-Time Systems
To further increase the performance and reliability of Windows or real-time systems, you can assign timed loops or threads to specific processor cores. With LabWindows/CVI 8.5, you have the flexibility to control these pools and thread assignments programmatically.
SMP on Windows
On Windows, processor affinity provides the potential for optimized cache performance. Specifically, by isolating threads that access the same data to one processor, you can reduce the number of data invalidations that arise from thread swapping between multiple cores.
By using the processor affinity functions within winbase.h, you can control which cores are available for system use and which ones are reserved for specific threads. There are two main functions that you can use to influence the availability of certain cores on Windows:
| BOOL WINAPI SetProcessAffinityMask(
__in HANDLE hProcess,
__in DWORD_PTR dwProcessAffinityMask
A processor affinity mask is a bit vector in which each bit represents the processor that the process threads are allowed to run on. The value of the processor affinity mask must be a subset of the system affinity mask values obtained by the GetProcessAffinityMask function.
| DWORD_PTR WINAPI SetThreadAffinityMask(
__in HANDLE hThread,
__in DWORD_PTR dwThreadAffinityMask
A thread affinity mask is a bit vector in which each bit represents the processors that a thread is allowed to run on. A thread affinity mask must be a subset of the processor affinity mask for the containing process of a thread. A thread can run only on the processors that its process can run on.
Note: On Windows, setting an affinity mask for a process or thread can result in threads receiving less processor time because the system is restricted from running the threads on certain processors. In most cases, it is better to let the Windows OS select an available processor.
SMP on a Real-Time System
In real-time applications, processor affinity helps you achieve increased performance and reliability because you can dedicate one core of a processor to execute a time-critical control thread and isolate it from less important threads that run on different cores. Use the symmetric multiprocessing functions in the Real-Time Utility Library to assign threads to CPUs on a real-time system.
Figure 2. The LabWindows/CVI Real-Time Module provides functions for assigning a time-critical thread to a dedicated CPU.
To add certain processors into a pool, use the ConfigureProcessorPool function. With this function, you can specify a processor mask, which determines which processors are included in the pool.
After setting up a pool of processors, you can assign certain threads to this pool with the SetProcessorAffinityForThread function. By calling this function, you can limit a thread to run only on a certain processor, thereby freeing up other processors to run other threads.
Notice that setting a thread to run on a certain processor does not reserve that processor for use exclusively by that thread. Only by combining the ConfigureProcessorPool and SetProcessorAffinityForThread functions can you remove a processor from a system pool and set up a thread to execute on it. Once you remove a CPU from the system pool, the CPU runs only the threads you assign to it.
Figure 3. Use the SetProcessorAffinityforThread function to assign a function to a particular core.
4. Debugging Multicore Applications
As applications become complex, it's important to understand at a low level how code executes on the real-time system. Adding processor cores to the system only amplifies this complexity. The NI Real-Time Execution Trace Toolkit 2.0 provides a visual representation of both function and thread execution on single-core or multicore systems, so you can find hotspots in your code and detect undesirable behaviors such as resource contention, memory allocations, and priority inversions.
Figure 4. The Real-Time Execution Trace Toolkit 2.0 provides multicore debugging capabilities.
You also can use the On-Screen CPU Monitor on LabWindows/CVI 8.5 Real-Time targets to monitor CPU utilization. This utility displays information -- such as Total Load, ISR (Interrupt Service Request) usage, and CPU usage by threads -- directly to a display connected to the real-time target, as shown in Figure 5.
Figure 5. The Real-Time On-Screen CPU Monitor is available on LabWindows/CVI 8.5 Real-Time targets.
5. Guaranteeing Multicore Readiness
As companies migrate applications based on a single-processing software stack to one based on a multiprocessing model, they must verify that each layer of the stack is multicore-ready. This process can consume a great deal of time and resources. One great advantage of the LabWindows/CVI 8.5 Real-Time Module is that the software stack meets all the requirements for multicore readiness, including a programming language equipped to create multithreaded applications, access to thread-safe libraries and drivers, ability to maintain determinism across multiple cores, and integrated debugging tools.
Real-Time Software Stack
What It Means to Be Multicore-Ready
Support is provided on RTOS, and tool allows for threading correctness and optimization. Debugging and tracing capabilities are provided to analyze real-time multicore systems.
Libraries are thread-safe and can be made reentrant so they may be executed in parallel. Algorithms are in place so as to not cause memory allocation and induce jitter into system.
Drivers are designed for optimal multithreaded performance.
Real-Time Operating System
The RTOS supports multithreading and multitasking, and it can load balance tasks on multicore processors with symmetric multiprocessing (SMP).
Table 1. The real-time software stack consists of a development tool, libraries, device drivers, and an RTOS.
Many bottlenecks in parallel applications occur because of user interface, analysis, or hardware library calls not being thread-safe. Non-reentrant code might still function properly, but it cannot execute in parallel on a multicore processor and it can become a shared resource, which leads to performance problems. LabWindows/CVI overcomes these pitfalls with reentrant analysis libraries and I/O drivers, such as NI-DAQmx.
Multicore processors provide the opportunity to have truly parallel operations within the same application. By creating multithreaded LabWindows/CVI applications on Windows and using the OS to auto-load balance across multiple cores, you can take full advantage of this technology.
LabWindows/CVI Real-Time opens the door for using high-performance multicore processors in your real-time applications. With LabWindows/CVI 8.5 Real-Time, you can realize the full benefits of processor affinity by isolating time-critical threads from lower-priority threads.
Simply migrating your LabWindows/CVI or LabWindows/CVI Real-Time application to a multiprocessor system does not guarantee a boost in performance. To achieve maximum performance from multiprocessor machines, an application must be multithreaded. Only by carefully planning your software architecture can you take full advantage of SMP.
7. More Resources on ANSI C Multicore Programming
- Read More about ANSI C Multithreading in LabWindows/CVI
- Learn How to Debug Multicore ANSI C Applications with LabWindows/CVI
- View the Achieving Multicore Performance in ANSI C with LabWindows/CVI Webcast
- Learn More about LabWindows/CVI
The mark LabWindows is used under a license from Microsoft Corporation.Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.