Deterministic Synchronization of Distributed Simulation Systems

Publish Date: Jul 01, 2008 | 6 Ratings | 3.83 out of 5 |  PDF


For Aerospace design engineers who build simulation systems using the LabVIEW Real-Time Module, third party distributed shared memory is a communication mechanism that allows you to partition the processing load between two or more computers. Unlike TCP/IP, distributed shared memory allows for deterministic synchronization between all computing nodes.

Table of Contents

  1. Background and Scope
  2. Introduction to Distributed Simulation Systems
  3. Communication Requirements
  4. Using Distributed Shared Memory
  5. Conclusion

1. Background and Scope

Simulation systems for aerospace applications are complex devices. They incorporate very specific I/O, sophisticated simulation algorithms as well as very precise timing and synchronization. With aircraft designs becoming more complex every day, control design, validation, manufacturing, and acceptance test tools must keep up in terms of performance, scalability and ease of use. Providing the computing power to execute computer simulation models with the required timing is a big challenge for suppliers. A common trend in the industry is to distribute the processing load between multiple computers to achieve scalability and maximize performance.

This document presents the basic communication scheme used for communicating simulation models that are distributed among multiple computers, and focuses on hard real-time applications.

Back to Top

2. Introduction to Distributed Simulation Systems

When building a simulation system, one challenge is to provide enough computing power for simulation model execution, I/O and user interface updates, actuator closed-loop control, and other tasks. For example, a simulation system could incorporate propulsion modeling coupled with a vehicle dynamics simulation in the same computer as shown in Figure 1.

Figure 1. Single Computer Simulation System.

The challenge with the architecture in Figure 1 is that the CPU may not be able to handle all the processing needs. It may also pose a barrier to system scalability.

One solution to this problem is to partition the simulation to run on two or more computers. This increases the CPU power available for each part of the simulation. Because the overall computational power has been increased, the step execution rate will be higher and have less jitter. Figure 2 illustrates a distributed implementation of the system shown in Figure 1.

Figure 2. Distributed Simulation System.

Note that this implementation has one user interface for both computers. Keep in mind that the system, even though it is distributed across multiple computers, behaves as a unified system. Depending on the application, you could include a local GUI per computer, but for most applications that is not necessary.

Another typical approach is to partition the system between the simulation and I/O processing. To achieve this, the physical interface to the real world is implemented as a separate I/O engine that communicates with the simulation process using synchronous communication. Figure 3 shows an example implementation using this approach.

Figure 3. Distributing Simulation System with I/O Engine

Notice that in Figure 3, the Real-Time Computer B has a dedicated I/O CPU. Its task is only to acquire data from the I/O and place it in the communication medium. Additional functionality can be added so that the analysis CPU can send commands to control or change the acquisition parameters on the fly.

An advantage of this scheme is that the I/O is decoupled from the analysis/simulation module, which allows it to be maintained and expanded separately. It also enables the simulation engine to be coupled with various I/O engines with specific functionality for different test scenarios.

It is important to note that these two approaches are not mutually exclusive. A simulation can apply both techniques to maximize the performance of the system. The decision to use one method over the other (or both together) must be carefully analyzed during the system design phase. Keep in mind that each added partition makes the system more flexible but also adds to its complexity.

The location of each component should be transparent to each of the other components. In general, components should only need to know where to get the data that they need and where to store the data they produce. To accomplish this in a distributed system, a robust and deterministic communication mechanism must be provided.

Back to Top

3. Communication Requirements

In general, simulation models that interface with real-world devices must execute deterministically. On distributed system, determinism must be guaranteed not only on each individual model, but also in the execution time of the system as a whole. Therefore, the communication between distributed nodes must be deterministic and provide minimum data transfer latency.

Another important property of the communication mechanism is its ability to signal other CPUs when some event has occurred. When distributed simulation models are executing, they must be aware of when to execute certain actions, such as reading from I/O or moving to the next execution step.

For distributed simulation applications, distributed shared memory is the most effective communication mechanism. If the system doesn't require hard determinism, UDP or TCP can also be used. For hard deterministic systems, distributed shared memory provides the minimum latency between nodes, the best data transfer determinism, and a reliable signaling mechanism via processor interrupts. The remainder of this document discusses how to use distributed shared memory in distributed simulation systems.

See Also:
Communication mechanisms for distributed real-time applications
Deterministic Data Streaming in Distributed Data Acquisition Systems

Back to Top

4. Using Distributed Shared Memory

The main challenge in building a distributing a simulation system is to design a communication protocol that supports deterministic communication between the system partitions.

The goal is to synchronize multiple computers so they take turns in their execution. This can be accomplished by setting an interrupt scheme where each computer generates an interrupt to inform other nodes in the system that data is ready and that it is their turn to execute. This mechanism is efficient because each simulation loop sleeps while it is waiting for its turn, allowing other processes in the same machine (i.e. user interface) to execute with minimal interruption. The following figure shows a possible execution flow for two computers in a distributed simulation system.

Figure 3. Execution flow of two distributed simulated computers.

Note that the execution flow for the two computers is identical. The only difference is that computer B has an offset in the order of its operations. By forcing computer B to initially wait for an interrupt you avoid race conditions and guarantee that both computers will take turns. In this example, computer A is considered the master and computer B is the slave. This scheme is expandable to as many nodes as needed.

When implementing this mechanism, different approaches will be required depending on your choice of distributed shared memory hardware. The goal is to implement node-to-node communication via interrupts.

Note for VMIC users: VMIC boards implement interrupts as a messaging scheme. The API provides mechanisms for 'sending' interrupts to different nodes in the distributed network. VMIC's protocol is very useful for these applications since no extra programming is required to signal a specific node.

Note for SCRAMNet users: SCRAMNet hardware allows configuring specific offsets to generate interrupts whenever they are written to. For this application, when a node wakes up because of an interrupt, it needs to verify the source and cause of the interrupt. If your system has more than two nodes, it is a good idea to specify a different offset for interrupt generation for each node. For example, node 1 can wait for an interrupt at offset 0x1, and node 2 can wait for an interrupt at offset 0x2. When a node wants to inform a remote node of an event, it just needs to know the destination node ID and generate an interrupt in the appropriate offset. If all the nodes wait for an interrupt in the same offset, they will all wake up regardless if the interrupt was directed to them. In this case you need to implement logic to send data along with the interrupt to inform the nodes which one should really act on the interrupt. All other nodes should ignore the interrupt and go back to wait for the next event.

Note: Make sure to utilize the timeout feature when waiting for an interrupt. Waiting forever may hang your program since nothing guarantees that an interrupt will be generated. When a 'wait for interrupt' call wakes up it is important to analyze the cause of it waking up. If it timed out, the node must handle the error and if the system is still running, it should go back and wait for the next interrupt.

This algorithm shows the basis of the communication scheme for distributed simulation systems. It doesn't account for error handling or extra conditions that may arise. Your application must account for all these details.

Note regarding DMA: Depending on the amount of data you need to transfer from the hosts to the distributed shared memory boards, using DMA may be the most efficient way to implement your application. Unlike applications where distributed shared memory is used in high-performance data acquisition, however, data streaming techniques may not be necessary because the amount of data to transfer is usually relatively small.

Back to Top

5. Conclusion

Distributing a simulation system across multiple computers can maximize its performance, maintainability, and scalability. Implementing a master-slave architecture using distributed shared memory is a convenient approach to sharing data and timing between computing nodes in a distributed system.

Back to Top

Bookmark & Share


Rate this document

Answered Your Question?
Yes No