Symmetric Multiprocessing in ANSI C with LabWindows™/CVI

Publish Date: Feb 02, 2012 | 5 Ratings | 4.80 out of 5 |  PDF

Overview


Multicore Programming Fundamentals White Paper Series


Symmetric multiprocessing, or SMP, is an operating system feature that allows multicore processors to run a single instance of an operating system, connect to a common main memory, and execute code in parallel. With this technology, you can easily move threads between processors to balance the workload efficiently. Most general-purpose operating systems such as Windows, Linux®, and Mac OS support SMP. However, real-time OS (RTOS) support for SMP is not trivial because the deterministic behavior of an RTOS must be preserved to meet hard real-time timing constraints while distributing threads across different processors. As one of the few RTOSs that support SMP, the OS used by LabWindows/CVI Real-Time ETS targets now helps you take full advantage of the latest multicore hardware.

This white paper describes how you can use SMP support in Windows with the Windows SDK. It also covers using the LabWindows/CVI 8.5 Real-Time Module to implement high-performance real-time applications on multicore systems.

Table of Contents

  1. Introduction
  2. Auto-Load Balancing
  3. Processor Affinity on Windows and Real-Time Systems
  4. Debugging Multicore Applications
  5. Guaranteeing Multicore Readiness
  6. Conclusion
  7. More Resources on ANSI C Multicore Programming

1. Introduction

With SMP, the multiple processors (or cores) in a system are assigned to certain “pools.” Once a processor is assigned to a pool, you can assign threads to execute on specific processors in a pool. Therefore, you can configure one pool to handle OS threads while another pool handles your application threads. In this way, SMP gives you complete control over load distribution on a multiprocessor system. Ideally, an OS that implements an SMP model not only provides the flexibility to assign threads to specific cores but also has built-in logic to load balance threads across multiple cores.

To take full advantage of the additional processing power of a multicore system, the software must be written such that the execution can be split among the available processing entities. According to Amdahl’s Law, the increase in application performance when using a multicore processor depends on the inherent parallelism within the software. Therefore, if your application has many parts that must execute serially, the increase in performance from a multicore system is minimal. However, an application that is architected to be inherently multithreaded sees a significant performance increase from a multicore system.

Read More:
Multithreading in LabWindows/CVI In-Depth White Paper

 

 

Back to Top

2. Auto-Load Balancing

In a multicore system, the responsibility for distributing threads between available cores lies with the OS. By default, standard operating systems implement auto-load balancing in multicore systems. This means that the thread scheduler within the OS automatically assigns the next thread to be executed to the most appropriate core based on various factors including processor utilization. By allowing an OS to auto-load balance, you can ensure that your application achieves the best execution time from a multicore processor.

With an RTOS, auto-load balancing depends on thread priorities to allow time-critical threads to interrupt lower-priority threads. The RTOS handles decisions about which core to use when performing round-robin scheduling of threads with the same priority. Multithreading and multitasking support in the RTOS provides the basis for taking advantage of multicore performance in real time. The LabWindows/CVI 8.5 Real-Time Module includes the NI RT Extensions for SMP, which adds multicore support to the ETS real-time operating system.

Figure 1. With LabWindows/CVI Real-Time support for SMP, you can achieve automatic load balancing across multiple cores.

Back to Top

3. Processor Affinity on Windows and Real-Time Systems

To further increase the performance and reliability of Windows or real-time systems, you can assign timed loops or threads to specific processor cores. With LabWindows/CVI 8.5, you have the flexibility to control these pools and thread assignments programmatically.  

SMP on Windows

On Windows, processor affinity provides the potential for optimized cache performance. Specifically, by isolating threads that access the same data to one processor, you can reduce the number of data invalidations that arise from thread swapping between multiple cores.

By using the processor affinity functions within winbase.h, you can control which cores are available for system use and which ones are reserved for specific threads. There are two main functions that you can use to influence the availability of certain cores on Windows:

 

 BOOL WINAPI SetProcessAffinityMask(

  __in          HANDLE hProcess,

  __in          DWORD_PTR dwProcessAffinityMask

);

A processor affinity mask is a bit vector in which each bit represents the processor that the process threads are allowed to run on. The value of the processor affinity mask must be a subset of the system affinity mask values obtained by the GetProcessAffinityMask function.

 DWORD_PTR WINAPI SetThreadAffinityMask(

__in HANDLE hThread,

__in DWORD_PTR dwThreadAffinityMask

);

A thread affinity mask is a bit vector in which each bit represents the processors that a thread is allowed to run on. A thread affinity mask must be a subset of the processor affinity mask for the containing process of a thread. A thread can run only on the processors that its process can run on.

Note: On Windows, setting an affinity mask for a process or thread can result in threads receiving less processor time because the system is restricted from running the threads on certain processors. In most cases, it is better to let the Windows OS select an available processor.

SMP on a Real-Time System

In real-time applications, processor affinity helps you achieve increased performance and reliability because you can dedicate one core of a processor to execute a time-critical control thread and isolate it from less important threads that run on different cores. Use the symmetric multiprocessing functions in the Real-Time Utility Library to assign threads to CPUs on a real-time system.

Choosing Load Distribution in CVI

Figure 2. The LabWindows/CVI Real-Time Module provides functions for assigning a time-critical thread to a dedicated CPU.

To add certain processors into a pool, use the ConfigureProcessorPool function. With this function, you can specify a processor mask, which determines which processors are included in the pool. 

After setting up a pool of processors, you can assign certain threads to this pool with the SetProcessorAffinityForThread function. By calling this function, you can limit a thread to run only on a certain processor, thereby freeing up other processors to run other threads.

Notice that setting a thread to run on a certain processor does not reserve that processor for use exclusively by that thread. Only by combining the ConfigureProcessorPool and SetProcessorAffinityForThread functions can you remove a processor from a system pool and set up a thread to execute on it. Once you remove a CPU from the system pool, the CPU runs only the threads you assign to it.

Choosing Load Distribution in CVI

Figure 3. Use the SetProcessorAffinityforThread function to assign a function to a particular core.

Back to Top

4. Debugging Multicore Applications

As applications become complex, it's important to understand at a low level how code executes on the real-time system. Adding processor cores to the system only amplifies this complexity. The NI Real-Time Execution Trace Toolkit 2.0 provides a visual representation of both function and thread execution on single-core or multicore systems, so you can find hotspots in your code and detect undesirable behaviors such as resource contention, memory allocations, and priority inversions.

Choosing Load Distribution in CVI

Figure 4. The Real-Time Execution Trace Toolkit 2.0 provides multicore debugging capabilities.

You also can use the On-Screen CPU Monitor on LabWindows/CVI 8.5 Real-Time targets to monitor CPU utilization. This utility displays information -- such as Total Load, ISR (Interrupt Service Request) usage, and CPU usage by threads -- directly to a display connected to the real-time target, as shown in Figure 5.

Choosing Load Distribution in CVI

Figure 5. The Real-Time On-Screen CPU Monitor is available on LabWindows/CVI 8.5 Real-Time targets.

Read More:
Debugging Multicore ANSI C Applications with LabWindows/CVI

Back to Top

5. Guaranteeing Multicore Readiness

As companies migrate applications based on a single-processing software stack to one based on a multiprocessing model, they must verify that each layer of the stack is multicore-ready. This process can consume a great deal of time and resources. One great advantage of the LabWindows/CVI 8.5 Real-Time Module is that the software stack meets all the requirements for multicore readiness, including a programming language equipped to create multithreaded applications, access to thread-safe libraries and drivers, ability to maintain determinism across multiple cores, and integrated debugging tools.

Real-Time Software Stack

What It Means to Be Multicore-Ready

Development Tool

Support is provided on RTOS, and tool allows for threading correctness and optimization. Debugging and tracing capabilities are provided to analyze real-time multicore systems.

Libraries

Libraries are thread-safe and can be made reentrant so they may be executed in parallel. Algorithms are in place so as to not cause memory allocation and induce jitter into system.

Device Drivers

Drivers are designed for optimal multithreaded performance.

Real-Time Operating System

The RTOS supports multithreading and multitasking, and it can load balance tasks on multicore processors with symmetric multiprocessing (SMP).

Table 1. The real-time software stack consists of a development tool, libraries, device drivers, and an RTOS.

Many bottlenecks in parallel applications occur because of user interface, analysis, or hardware library calls not being thread-safe. Non-reentrant code might still function properly, but it cannot execute in parallel on a multicore processor and it can become a shared resource, which leads to performance problems. LabWindows/CVI overcomes these pitfalls with reentrant analysis libraries and I/O drivers, such as NI-DAQmx.

Back to Top

6. Conclusion

Multicore processors provide the opportunity to have truly parallel operations within the same application. By creating multithreaded LabWindows/CVI applications on Windows and using the OS to auto-load balance across multiple cores, you can take full advantage of this technology.

LabWindows/CVI Real-Time opens the door for using high-performance multicore processors in your real-time applications. With LabWindows/CVI 8.5 Real-Time, you can realize the full benefits of processor affinity by isolating time-critical threads from lower-priority threads.

Simply migrating your LabWindows/CVI or LabWindows/CVI Real-Time application to a multiprocessor system does not guarantee a boost in performance. To achieve maximum performance from multiprocessor machines, an application must be multithreaded. Only by carefully planning your software architecture can you take full advantage of SMP.

Back to Top

7. More Resources on ANSI C Multicore Programming


Multicore Programming Fundamentals White Paper Series


   

 

The mark LabWindows is used under a license from Microsoft Corporation.Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

            

Back to Top

Bookmark & Share


Ratings

Rate this document

Answered Your Question?
Yes No

Submit