When creating real world multicore applications using pipelining, a programmer must take several important concerns into account. In specific, balancing pipeline stages and minimizing memory transfer between cores are critical to realizing performance gains with pipelining.
In both the car manufacturing and LabVIEW examples above, each pipeline stage was assumed to take an equal amount of time to execute; we can say that these example pipeline stages were balanced. However, in real-world applications this is rarely the case. Consider the diagram below; if Stage 1 takes three times as long to execute as Stage 2, then pipelining the two stages produces only a minimal performance increase.
Non-Pipelined (total time = 4s):
Pipelined (total time = 3s):
Note: Performance increase = 1.33X (not an ideal case for pipelining)
To remedy this situation, the programmer must move tasks from Stage 1 to Stage 2 until both stages take approximately equal times to execute. With a large number of pipeline stages, this can be a difficult task.
In LabVIEW, it is helpful to benchmark each of your pipeline stages to ensure that the pipeline is well balanced. This can most easily be done using a flat sequence structure in conjunction with the Tick Count (ms) function as shown in Figure 4.
Figure 4. Benchmark your pipeline stages to ensure a well balanced pipeline.
Data Transfer Between Cores
It is best to avoid transferring large amounts of data between pipeline stages whenever possible. Since the stages of a given pipeline could be running on separate processor cores, any data transfer between individual stages could actually result in a memory transfer between physical processor cores. In the case that two processor cores do not share a cache (or the memory transfer size exceeds the cache size), the end application user may see a decrease in pipelining effectiveness.