Comparing Common File I/O and Data Storage Approaches

For many new test systems, choosing the right data storage approach is an afterthought in the overall application design. Engineers often end up selecting the storage strategy that most easily meets the needs of the application in its current state without considering future requirements. Yet storage format choices can have a large effect on the overall efficiency of the acquisition system as well as the postprocessing of the raw data over time. You have many characteristics to consider when evaluating storage formats such as:

  • File sharing and exchangeability
  • Disk footprint
  • Simple inclusion of meta information and properties
  • Reading and writing speeds

Depending on the application, you may prioritize certain characteristics over others. Common storage formats such as ASCII, binary, and XML have strengths in different areas. The NI TDM Streaming format, based on the technical data management (TDM) data model, aims to combine the strengths of each of these common formats as well as improve file storage based on the needs of engineers and scientists.

 

ASCII Files

Many engineers prefer to store data using ASCII files because of the files’ easy exchangeability and human readability. ASCII files make it simple to quickly open files written from acquisitions and view data immediately as well as to easily share the data with colleagues because the files can be opened in common software applications found on most computers today such as Notepad, Wordpad, and Microsoft Excel. ASCII files, however, have several drawbacks, including a large disk footprint, which can be an issue when storage space is limited (for example, storing data on a distributed system). Also, reading and writing data from an ASCII file can be significantly slower compared to other formats. In many cases, the write speed of an ASCII file cannot keep up with the speeds of acquisition systems, which can lead to data loss.

ASCII Files

Figure 1: ASCII files are easy to exchange but can be too slow and large for many applications.

 

Binary Files

Another typical storage approach that is somewhat on the opposite end of the spectrum from ASCII is binary files. In contrast to ASCII files, binary files have a significantly smaller disk footprint and can be streamed to disk at extremely high speeds, making them ideal for high-channel-count and real-time applications. The shortcoming of saving data to a binary file format is that it is not human-readable and therefore difficult to exchange between users. Binary files cannot be immediately opened by common software; they have to be interpreted by an application or program. Different applications may interpret binary data in different ways, which causes confusion. One application may read the binary values as textual characters while another may interpret the values as colors. To share the files with colleagues, you must provide them with an application that interprets your specific binary file correctly. Also, if you make changes to how the data is written in the acquisition application, these changes must also be reflected within the application that is reading data. This can potentially cause long-term application versioning issues and headaches that can ultimately result in lost data.

Binary Files

Figure 2: Binary files are beneficial in high-speed, limited-space applications but can cause exchangeability issues.

 

XML Files

Over the last several years, the XML format has been gaining in popularity due to its ability to store complex data structures. With XML files, you can store data and formatting along with the raw measurement values. Using the flexibility of the XML format, you can store additional information with the data in a structured manner. XML is also relatively human-readable and exchangeable. Similar to ASCII, XML files can be opened in many common text editors as well as XML-capable Internet browsers, such as Microsoft Internet Explorer. However, in its raw form, XML includes tags within the file that describe the structures. These tags also appear when XML files are opened in these applications, which somewhat limits the readability because you must be able to understand these tags. The weakness of the XML file format is that it has an extremely large disk footprint compared to other files and cannot be used to stream data directly to disk. Also, a downside to being able to store these complex structures is that they may require considerable planning when you design the layout, or schema, of the XML structures.

XML Files

Figure 3: XML files allow users to define complex structures but are significantly larger and slower than other formats.

 

TDMS Files

Several years ago, National Instruments developed a new file format, TDM Streaming (TDMS), specifically to meet the needs of engineers and scientists collecting test data and to address the previously mentioned concerns. TDMS is a binary-based file format, so it has a small disk footprint and can stream data to disk at high speeds. At the same time, TDMS files contain a header component that stores descriptive information, or attributes, with the data. Some attributes such as file name, date, and file path are stored automatically; however, you can easily add your own custom attributes as well. Another advantage of the TDMS file format is the built-in three-level hierarchy: file, group, and channel levels. A TDMS file can contain an unlimited number of groups, and each group can contain an unlimited number of channels. You can add attributes at each of these levels describing and documenting your test data for better understanding. This hierarchy creates an inherent organization of your test data. Finally, although TDMS files are binary, you can open them in many common applications, such as Microsoft Excel and OpenOffice, for sharing with colleagues. Thus, TDMS files give you the benefits of easy exchangeability and attribute inclusion without sacrificing speed and size.

TDMS Files

Figure 4: The hierarchical TDMS file format is designed to meet the needs of engineers collecting measurement data.

 

Choosing the Right File Format for Your Application

When you examine many of the common formats used to store test and measurement data, ASCII, binary, and XML, you can see that there are pros and cons to each approach (Table 1).

Comparing table

Table 1: The TDMS format was designed to address holes in the approaches of common formats.

The need to properly address each of these concerns is the reason that National Instruments developed the TDMS file format. Using this format, you can properly document and organize your data without losing the ability to stream data at high speeds.

Additional Resources