The first step to achieving a cohesive data management solution is ensuring that data is stored in the most efficient, organized, and scalable fashion. All too often data is stored without descriptive information, in inconsistent formats, and scattered about on arrays of computers, which creates a graveyard of information that makes it extremely difficult to locate a particular data set and derive decisions from it.
Depending on the application, you may prioritize certain characteristics over others. Common storage formats such as ASCII, binary, and XML have strengths and weaknesses in different areas.
Many engineers prefer to store data using ASCII (American Standard Code for Information Interchange) files because of the format’s easy exchangeability and human readability. However, ASCII files have several drawbacks, including a large disk footprint, which can be an issue when storage space is limited (for example, storing data on a distributed system). Reading and writing data from an ASCII file can be significantly slower compared to other formats and in many cases, the write speed of an ASCII file cannot keep up with the speeds of acquisition systems, which can lead to data loss.
Figure 1. ASCII files are easy to exchange but can be too slow and large for many applications.
|Another typical storage approach that is somewhat on the opposite end of the spectrum from ASCII is binary files. In contrast to ASCII files, binary files feature a significantly smaller disk footprint and can be streamed to disk at extremely high speeds, making them ideal for high-channel-count and real-time applications. A drawback to using binary is its unreadable format that complicates exchangeability between users. Binary files cannot be immediately opened by common software; they have to be interpreted by an application or program. Different applications may interpret binary data in different ways, which causes confusion. One application may read the binary values as textual characters while another may interpret the values as colors. To share the files with colleagues, you must provide them with an application that interprets your specific binary file correctly. Also, if you make changes to how the data is written in the acquisition application, these changes must also be reflected within the application that is reading data. This can potentially cause long-term application versioning issues and headaches that can ultimately result in lost data.
Figure 2. Binary files are beneficial in high-speed, limited-space applications but can cause exchangeability issues.
|Over the last several years, the XML format has gained popularity due to its ability to store complex data structures. With XML files, you can store data and formatting along with the raw measurement values. Using the flexibility of the XML format, you can store additional information with the data in a structured manner. XML is also relatively human-readable and exchangeable. Similar to ASCII, XML files can be opened in many common text editors as well as XML-capable Internet browsers, such as Microsoft Internet Explorer. However, in its raw form, XML includes tags within the file that describe the structures. These tags also appear when XML files are opened in these applications, which somewhat limits the readability because you must be able to understand these tags. The weakness of the XML file format is that it has an extremely large disk footprint compared to other files and cannot be used to stream data directly to disk. Also, a downside to being able to store these complex structures is that they may require considerable planning when you design the layout, or schema, of the XML structures.
Figure 3. XML files can help define complex structures but are significantly larger and slower than other formats.
Database files are composed of a series of tables, built using columns and rows, and information may or may not be linked between tables. Searchability makes database files advantageous, however they can be impractical for time-based measurement applications given the amount of data acquired and the need to either purchase or build a formal database solution from scratch. Time-based measurements cause databases to become bloated, which slows down the query returns, defeating the purpose of databases in the first place.
Technical Data Management Streaming (TDMS) is a binary-based file format, so it has a small disk footprint and can stream data to disk at high speeds. At the same time, TDMS files contain a header component that stores descriptive information, or attributes, with the data. Some attributes such as file name, date, and file path are stored automatically; however, you can easily add your own custom attributes as well. Another advantage of the TDMS file format is the built-in three-level hierarchy: file, group, and channel levels. A TDMS file can contain an unlimited number of groups, and each group can contain an unlimited number of channels. You can add attributes at each of these levels describing and documenting your test data for better understanding. This hierarchy creates an inherent organization of your test data.
Table 1. The TDMS file format combines the benefits of several data storage options in one file format.