A Comprehensive Solution to Large-Scale Data Management

Overview

In response to the challenges engineers and scientists face when managing large amounts of test and simulation data, National Instruments has built a three-piece solution for test data management. A key component of this solution, NI DataFinder technology, helps you index test files for simple retrieval and mining. This paper discusses expanding this technology for large groups and departments with SystemLink TDM DataFinder Module.

Contents

The Pain Points of Data Management

Test engineers today face increasingly challenging time and budgetary constraints when designing data acquisition systems. Consumer demand for higher-quality products at lower prices continues to force stricter system design requirements. To offset decreasing margins, data acquisition systems must be designed (or evolve) to be solutions – that is to say, you must be able to use them not only for the initial data acquisition but also the data management once you have collected the data. Without the implementation of an effective data management solution, you may wind up wasting valuable time (and therefore money) attempting to extract information from your acquired data so you can make educated engineering decisions. You may resort to manually searching through files that are likely stored in different formats and in varying locations on disk – and potentially on different machines – to find and analyze your data sets. Even just a few hours per week wasted due to an inefficient data management solution can cause budgetary increases and delays that can propagate through the entire product design cycle, drastically increasing time to market.

Traditional data management approaches such as manual file- and folder-naming conventions or standard database implementations offer unique benefits but fail to provide an all-encompassing data management solution on their own. While manually organizing and naming your files and folders in a manner that fits the application at hand has the immediate advantage of being free and easily customizable, the organization can become cumbersome as tests change and grow, causing the most headaches when searching for data sets and quickly buckling under when a file is inadvertently renamed or moved. Standard databases such as Access or Oracle are easily queried and provide swift retrieval of data but are extremely costly to design and implement and require significant maintenance and resources if you need continued customization and expandability.

Companies spend considerable time and money designing and implementing their data acquisition systems but often fail to thoroughly plan for data management. If you invest heavily in acquiring your data, be sure to invest the needed time and money in managing this data as well.

NI Technical Data Management Solution

National Instruments has identified three key pieces to the data management puzzle: flexible and organized file storage, a comprehensive data index for advanced search capabilities, and an interactive data retrieval and post-processing environment. As a result, the NI technical data management (TDM) solution consists of three components: the TDM data model for storing descriptive information with your test files, NI DataFinder for indexing test data for search regardless of file format, and NI DIAdem software for searching, mining, analysis, and reporting.

The TDM data model for file storage logically arranges data in a hierarchical fashion and stores meta information containing both native and custom properties for the data set, channel group, and channel level. The TDM file format is completely expandable and customizable, allowing for detailed and well-documented test data.

Figure 1. The TDM Data Model for Saving Well-Documented Test Data

Once you have documented your test data with properties, NI DataFinder – the most overlooked component in data management solutions – provides an out-of-the-box utility for mining test data. NI DataFinder automatically searches specified areas of your hard disk and creates an index containing the valuable information within the metadata of your data files. Once indexed, test data is fully searchable using easy, Internet-like searching as well as advanced queries using DIAdem. You no longer have to manually hunt for the data sets you desire; NI DataFinder keeps track of the pertinent information on test data files, no matter how they are arranged on disk.

After NI DataFinder locates the appropriate data sets, you can use DIAdem to extract the information you need from your test data and take advantage of DIAdem utilities to interact with, analyze, and create professional reports from that data. The faster you can effectively analyze and report test data, the faster your team can make educated engineering decisions.

DIAdem DataFinder and SystemLink TDM DataFinder Module

A local NI DataFinder index, the DIAdem DataFinder installs automatically with DIAdem. Once installed, DIAdem DataFinder needs to know where to find your test data on disk. You can configure DIAdem DataFinder to index your entire hard disk, but you also can identify (or exclude) specific locations on disk where you expect test data to be located. This prevents the system resource overhead that results from unnecessarily indexing files or folders while allowing for easy scalability should you need to specify new search areas in the future.

After you have configured search areas, DIAdem DataFinder automatically builds and maintains an index of all files that meet the file type and location criteria you specified in the DIAdem DataFinder configuration. The properties stored, once indexed, become instantly searchable from within the DIAdem environment. As soon as a valid data file is created, deleted, or edited, DIAdem DataFinder automatically notices and reindexes the hierarchy and properties of this file. DIAdem DataFinder dynamically manages its own data tables and updates them based on file events and the contents of each file. Therefore, unlike many expensive database solutions, you can change or add information as requirements change without reconsidering or redesigning your data management solution.

Figure 2. Using the Advanced Search, you can quickly find trends and correlations within your test data.

DIAdem DataFinder can inherently index TDM and TDM Streaming files with specified search areas, but legacy and/or third-party data may exist in a different file format. The NI data management solution was designed to be modular in nature and take this reality into account. To meet the challenge of integrating multiple different file formats, you can create and install DataPlugins that “translate” arbitrary data file formats into the TDM structure for easy integration into DIAdem.

Figure 3. The index stores all the descriptive information included with a file, so you can mine and search on these values.

The DIAdem DataFinder, without a doubt the cornerstone of the NI data management solution, was designed for individuals and occasionally does not meet the needs of larger groups accessing data across multiple machines. As a result, National Instruments offers two data management options – DIAdem DataFinder and SystemLink TDM DataFinder Module. With DIAdem DataFinder, you can easily search test data stored within your local index; however, needs and requirements change as you expand a data management solution to large groups or departments. To meet these needs, National Instruments developed the SystemLink TDM DataFinder Module specifically designed for large groups and departments.

Benefits of SystemLink TDM DataFinder Module

SystemLink TDM DataFinder Module expands on the concept and technology of DIAdem DataFinder and includes several features and capabilities that make it the ideal data management tool for large groups in which multiple engineers need to access large amounts of data possibly stored in multiple locations

Decreased Network Traffic

If test engineers wanted to search data across multiple test stations without using SystemLink, each client machine would require the establishment of an individual network connection with that test station. Each client machine’s DIAdem DataFinder would have to index the files of each of the test stations with which it communicates, consistently crawling the network to maintain up-to-date information about the data files on all test stations. This dramatically increases the strain on network resources because the actual indexing takes place over the network. Over time, this increase in used bandwidth may become unfeasible – especially in scenarios when network bandwidth needs to be conserved.

Figure 4. When using only the local DIAdem DataFinder, each client machine’s index must connect to each test station individually.

SystemLink TDM DataFinder Module alleviates this issue and leads to potentially faster indexing because it installs and functions on a common server machine. This server machine houses the single SystemLink DataFinder index, which crawls the specified search areas of all configured test stations. The server machine then functions as a single location on which the common index is housed. Client machines no longer need to interface with each test station individually because they can communicate with the intermediate server machine. When you store data files and enableSystemLink TDM DataFinder Module on an intermediate server machine, you preserve network resources because the only information traveling over the network are the actual client queries of the index. 

Figure 5. SystemLink centralizes the metadata from multiple test stations so you can easily access and mine it with multiple clients simultaneously with the SystemLink TDM DataFinder Module.

Multiple Concurrent Connections

For large-scale data management, multiple engineers may need to concurrently retrieve information about existing data files. Because SystemLink TDM DataFinder Module is intended to be installed on a high-bandwidth machine running a Windows server operating system, it can support up to 25 concurrent client connections to the central index, a dramatic increase over DIAdem DataFinder and an important feature in scenarios where multiple people may need access to centralized data at one time. This allows engineers to concurrently gain access to data files – without worrying about other engineers engaging and reserving resources – and more immediately retrieve data.

Minimal Client Setup

To promote consistency and ensure that expandability requires no in-depth technical knowledge for client machine configuration, SystemLink TDM DataFinder Module gives you the ability to export client configurations from the server. With a few clicks of the mouse, you can generate all of the settings necessary to configure client machines to seamlessly interface with the index created by SystemLink TDM DataFinder Module into one *.urf file. Once this *.urf is distributed to client machines, installation of this configuration is as easy as double-clicking the file from its location on disk, which automatically takes care of all client machine configuration and helps you open access to the SystemLink DataFinder index. And because SystemLink TDM DataFinder Module gives you the ability to export DataPlugins along with the client configuration, you can be assured that query results from one client machine are identical to those of another client machine without having to individually export (and later manage the import) of each registered DataPlugin on the server.

Figure 6. SystemLink TDM DataFinder Module gives you the ability to export configurations, which can include DataPlugins, for easy client machine setup. - NEED FILENAME FOR NEW FIGURE 6.

Consistency

In situations where multiple client machines are attempting to access data stored across many test stations on a network, SystemLink TDM DataFinder Module ensures consistency in search areas, search results, and DataPlugins. Without SystemLink TDM DataFinder Module, you must individually configure each client DIAdem DataFinder to index search areas that consist of multiple directories on each of the network’s test stations. As test systems grow in complexity, and you remove or add multiple client machines or test stations, you must reconfigure each client machine to account for the search areas present across the entire system at any given time. If you do not perform regular maintenance, search areas configured among client machines may become inaccurate or incomplete. Because SystemLink TDM DataFinder Module resides on one intermediate server machine with each client machine configured to communicate directly with it, you need to perform maintenance only on the common SystemLink server instead of each client.

The implementation of SystemLink TDM DataFinder Module yields one common DataFinder configuration (and therefore a common metadata index and search areas), so consistency among search results is guaranteed. Otherwise, inconsistencies between search areas and DataPlugins defined on client machines may yield inconsistencies in search results between the different machines. Different test engineers using conflicting or incomplete search results could cause communication headaches that result in costly product development delays.

User Management and Security

It is common to have dozens or even hundreds of test engineers interfacing with test systems and the data files that they generate. That being said, not all engineers involved in a project should always have access to all data files generated by the test stations. For reasons of privacy, security, or intellectual property, situations arise when you need to restrict the access of certain users to sensitive data files and folders.

Figure 7. By capitalizing on the already-configured Windows permission settings, SystemLink TDM DataFinder Module requires no additional work to restrict access to sensitive data files.

SystemLink TDM DataFinder Module directly interfaces with user management policies already in place as part of the Windows operating system. When you enable security via a simple configuration checkbox, SystemLink begins restricting access to files and folders based on the current permission settings. Without requiring any additional verification (users are authenticated only once by the operating system when they log in to the client machine), read, write, and even query access to the data files and folders exposed by the index mimics the user management setup of the operating system. With SystemLink, securing sensitive data according to policies already in place is as straightforward as using a simple checkbox to enable security.

Archiving

As technology continues to evolve and improve, factors such as multicore processors, increased memory, and faster sampling rates – along with the fact that test systems are growing more complex – are resulting in the collection of ever-increasing amounts of data. Though disk storage is relatively cost-effective, situations where you are storing (and therefore backing up) large amounts of data require archiving systems that transfer data to inexpensive, high-capacity storage media such as magnetic tapes.

Based on configured rules, background system processes automatically transfer files to the archive. When you transfer a file, it is replaced on disk with an empty “stub” file by the same name and attributes that denote whether the original file is stored in the archive. When you access files later, they are automatically restored from the archive to their original locations on disk.

In close cooperation with the company SER, National Instruments has designed SystemLink TDM DataFinder Module to integrate smoothly into the SER archiving system. Because SystemLink can recognize archived files and save the archiving flag together with descriptive TDM and TDM Streaming file data in the index, you can search and mine archived data along with data on disk.

Conclusion

SystemLink TDM DataFinder Module extends the capabilities of DIAdem DataFinder to offer a more robust solution for large-scale data management. When multiple client machines and multiple test stations are involved, as is common in today’s increasingly complex test systems, SystemLink ensures decreased strain on network resources, consistent search results among client machines, ease of installation and client configuration, and automatic integration with archiving systems and Windows user permission settings. A complement to DIAdem DataFinder, SystemLink TDM DataFinder Module can help you further streamline large-scale simulation and test data management solutions with the NI TDM solution.