A Comprehensive Solution to Large-Scale Data Management

Publish Date: Aug 01, 2014 | 10 Ratings | 2.30 out of 5 |  PDF

Overview

In response to the challenges engineers and scientists face when managing large amounts of test and simulation data, National Instruments has built a three-piece solution for test data management. A key component of this solution, NI DataFinder technology, helps you index test files for simple retrieval and mining. This paper discusses expanding this technology for large groups and departments with NI DataFinder Server Edition.

Table of Contents

  1. The Pain Points of Data Management
  2. NI Technical Data Management Solution
  3.  DIAdem DataFinder and NI DataFinder Server Edition
  4. Benefits of NI DataFinder Server Edition
  5. Conclusion

1. The Pain Points of Data Management

Test engineers today face increasingly challenging time and budgetary constraints when designing data acquisition systems. Consumer demand for higher-quality products at lower prices continues to force stricter system design requirements. To offset decreasing margins, data acquisition systems must be designed (or evolve) to be solutions – that is to say, you must be able to use them not only for the initial data acquisition but also the data management once you have collected the data. Without the implementation of an effective data management solution, you may wind up wasting valuable time (and therefore money) attempting to extract information from your acquired data so you can make educated engineering decisions. You may resort to manually searching through files that are likely stored in different formats and in varying locations on disk – and potentially on different machines – to find and analyze your data sets. Even just a few hours per week wasted due to an inefficient data management solution can cause budgetary increases and delays that can propagate through the entire product design cycle, drastically increasing time to market.

Traditional data management approaches such as manual file- and folder-naming conventions or standard database implementations offer unique benefits but fail to provide an all-encompassing data management solution on their own. While manually organizing and naming your files and folders in a manner that fits the application at hand has the immediate advantage of being free and easily customizable, the organization can become cumbersome as tests change and grow, causing the most headaches when searching for data sets and quickly buckling under when a file is inadvertently renamed or moved. Standard databases such as Access or Oracle are easily queried and provide swift retrieval of data but are extremely costly to design and implement and require significant maintenance and resources if you need continued customization and expandability.

Companies spend considerable time and money designing and implementing their data acquisition systems but often fail to thoroughly plan for data management. If you invest heavily in acquiring your data, be sure to invest the needed time and money in managing this data as well.

Back to Top

2. NI Technical Data Management Solution

National Instruments has identified three key pieces to the data management puzzle: flexible and organized file storage, a comprehensive data index for advanced search capabilities, and an interactive data retrieval and post-processing environment. As a result, the NI technical data management (TDM) solution consists of three components: the TDM data model for storing descriptive information with your test files, NI DataFinder for indexing test data for search regardless of file format, and NI DIAdem software for searching, mining, analysis, and reporting.

The TDM data model for file storage logically arranges data in a hierarchical fashion and stores meta information containing both native and custom properties for the data set, channel group, and channel level. The TDM file format is completely expandable and customizable, allowing for detailed and well-documented test data.

Figure 1. The TDM Data Model for Saving Well-Documented Test Data

Once you have documented your test data with properties, NI DataFinder – the most overlooked component in data management solutions – provides an out-of-the-box utility for mining test data. NI DataFinder automatically searches specified areas of your hard disk and creates an index containing the valuable information within the metadata of your data files. Once indexed, test data is fully searchable using easy, Internet-like searching as well as advanced queries using DIAdem. You no longer have to manually hunt for the data sets you desire; NI DataFinder keeps track of the pertinent information on test data files, no matter how they are arranged on disk.

After NI DataFinder locates the appropriate data sets, you can use DIAdem to extract the information you need from your test data and take advantage of DIAdem utilities to interact with, analyze, and create professional reports from that data. The faster you can effectively analyze and report test data, the faster your team can make educated engineering decisions.

Back to Top

3.  DIAdem DataFinder and NI DataFinder Server Edition

A local NI DataFinder index, the DIAdem DataFinder installs automatically with DIAdem. Once installed, DIAdem DataFinder needs to know where to find your test data on disk. You can configure DIAdem DataFinder to index your entire hard disk, but you also can identify (or exclude) specific locations on disk where you expect test data to be located. This prevents the system resource overhead that results from unnecessarily indexing files or folders while allowing for easy scalability should you need to specify new search areas in the future.

After you have configured search areas, DIAdem DataFinder automatically builds and maintains an index of all files that meet the file type and location criteria you specified in the DIAdem DataFinder configuration. The properties stored, once indexed, become instantly searchable from within the DIAdem environment. As soon as a valid data file is created, deleted, or edited, DIAdem DataFinder automatically notices and reindexes the hierarchy and properties of this file. DIAdem DataFinder dynamically manages its own data tables and updates them based on file events and the contents of each file. Therefore, unlike many expensive database solutions, you can change or add information as requirements change without reconsidering or redesigning your data management solution.

Figure 2. Using the Advanced Search, you can quickly find trends and correlations within your test data.

DIAdem DataFinder can inherently index TDM and TDM Streaming files with specified search areas, but legacy and/or third-party data may exist in a different file format. The NI data management solution was designed to be modular in nature and take this reality into account. To meet the challenge of integrating multiple different file formats, you can create and install DataPlugins that “translate” arbitrary data file formats into the TDM structure for easy integration into DIAdem.

Figure 3. The index stores all the descriptive information included with a file, so you can mine and search on these values.

The DIAdem DataFinder, without a doubt the cornerstone of the NI data management solution, was designed for individuals and occasionally does not meet the needs of larger groups accessing data across multiple machines. As a result, National Instruments offers  two data management options – DIAdem DataFinder and NI DataFinder Server Edition.
With DIAdem DataFinder, you can easily search test data stored within your local index; however, needs and requirements change as you expand a data management solution to large groups or departments. To meet these needs, National Instruments developed the NI DataFinder Server Edition specifically designed for large groups and departments.

Back to Top

4. Benefits of NI DataFinder Server Edition

NI DataFinder Server Edition expands on the concept and technology of DIAdem DataFinder and includes several features and capabilities that make it the ideal data management tool for large groups in which multiple engineers need to access large amounts of data possibly stored in multiple locations.

Decreased Network Traffic

If test engineers wanted to search data across multiple test stations without using NI DataFinder Server Edition, each client machine would require the establishment of an individual network connection with that test station. Each client machine’s DIAdem DataFinder would have to index the files of each of the test stations with which it communicates, consistently crawling the network to maintain up-to-date information about the data files on all test stations. This dramatically increases the strain on network resources because the actual indexing takes place over the network. Over time, this increase in used bandwidth may become unfeasible – especially in scenarios when network bandwidth needs to be conserved.

Figure 4. When using only the local DIAdem DataFinder, each client machine’s index must connect to each test station individually.

NI DataFinder Server Edition alleviates this issue and leads to potentially faster indexing because it installs and functions on a common server machine. This server machine houses the single NI DataFinder Server Edition index, which crawls the specified search areas of all configured test stations. The server machine then functions as a single location on which the common index is housed. Client machines no longer need to interface with each test station individually because they can communicate with the intermediate server machine. When you store data files and NI DataFinder Server Edition on an intermediate server machine, you preserve network resources because the only information traveling over the network are the actual client queries of the index. 

Figure 5. NI DataFinder Server Edition centralizes the metadata from multiple test stations so you can easily access and mine it with multiple clients simultaneously.

Multiple Concurrent Connections

For large-scale data management, multiple engineers may need to concurrently retrieve information about existing data files. Because NI DataFinder Server Edition is intended to be installed on a high-bandwidth machine running a Windows server operating system, it can support up to 25 concurrent client connections to the central index, a dramatic increase over DIAdem DataFinder and an important feature in scenarios where multiple people may need access to NI DataFinder Server Edition at one time. This allows engineers to concurrently gain access to data files – without worrying about other engineers engaging and reserving resources – and more immediately retrieve data.

Minimal Client Setup

To promote consistency and ensure that expandability requires no in-depth technical knowledge for client machine configuration, NI DataFinder Server Edition gives you the ability to export client configurations from the server. With a few clicks of the mouse, you can generate all of the settings necessary to configure client machines to seamlessly interface with the index created by NI DataFinder Server Edition into one *.urf file. Once this *.urf is distributed to client machines, installation of this configuration is as easy as double-clicking the file from its location on disk, which automatically takes care of all client machine configuration and helps you open access to the NI DataFinder Server Edition index. And because NI DataFinder Server Edition gives you the ability to export DataPlugins along with the client configuration, you can be assured that query results from one client machine are identical to those of another client machine without having to individually export (and later manage the import) of each registered DataPlugin on the server.

Figure 6. NI DataFinder Server Edition gives you the ability to export configurations, which can include DataPlugins, for easy client machine setup.

Consistency

In situations where multiple client machines are attempting to access data stored across many test stations on a network, NI DataFinder Server Edition ensures consistency in search areas, search results, and DataPlugins. Without NI DataFinder Server Edition, you must individually configure each client DIAdem DataFinder to index search areas that consist of multiple directories on each of the network’s test stations. As test systems grow in complexity, and you remove or add multiple client machines or test stations, you must reconfigure each client machine to account for the search areas present across the entire system at any given time. If you do not perform regular maintenance, search areas configured among client machines may become inaccurate or incomplete. Because NI DataFinder Server Edition resides on one intermediate server machine with each client machine configured to communicate directly with it, you need to perform maintenance only on the common NI DataFinder Server Edition instead of each client.

The implementation of NI DataFinder Server Edition yields one common DataFinder configuration (and therefore a common metadata index and search areas), so consistency among search results is guaranteed. Otherwise, inconsistencies between search areas and DataPlugins defined on client machines may yield inconsistencies in search results between the different machines. Different test engineers using conflicting or incomplete search results could cause communication headaches that result in costly product development delays.

User Management and Security

It is common to have dozens or even hundreds of test engineers interfacing with test systems and the data files that they generate. That being said, not all engineers involved in a project should always have access to all data files generated by the test stations. For reasons of privacy, security, or intellectual property, situations arise when you need to restrict the access of certain users to sensitive data files and folders.

Figure 7. By capitalizing on the already-configured Windows permission settings, NI DataFinder Server Edition requires no additional work to restrict access to sensitive data files.

NI DataFinder Server Edition directly interfaces with user management policies already in place as part of the Windows operating system. When you enable security via a simple configuration checkbox, NI DataFinder Server Edition begins restricting access to files and folders based on the current permission settings. Without requiring any additional verification (users are authenticated only once by the operating system when they log in to the client machine), read, write, and even query access to the data files and folders exposed by the index mimics the user management setup of the operating system. With NI DataFinder Server Edition, securing sensitive data according to policies already in place is as straightforward as using a simple checkbox to enable security.

Archiving

As technology continues to evolve and improve, factors such as multicore processors, increased memory, and faster sampling rates – along with the fact that test systems are growing more complex – are resulting in the collection of ever-increasing amounts of data. Though disk storage is relatively cost-effective, situations where you are storing (and therefore backing up) large amounts of data require archiving systems that transfer data to inexpensive, high-capacity storage media such as magnetic tapes.

Based on configured rules, background system processes automatically transfer files to the archive. When you transfer a file, it is replaced on disk with an empty “stub” file by the same name and attributes that denote whether the original file is stored in the archive. When you access files later, they are automatically restored from the archive to their original locations on disk.

In close cooperation with the company SER, National Instruments has designed NI DataFinder Server Edition to integrate smoothly into the SER archiving system. Because NI DataFinder Server Edition can recognize archived files and save the archiving flag together with descriptive TDM and TDM Streaming file data in the index, you can search and mine archived data along with data on disk.

Back to Top

5. Conclusion

NI DataFinder Server Edition extends the capabilities of DIAdem DataFinder to offer a more robust solution for large-scale data management. When multiple client machines and multiple test stations are involved, as is common in today’s increasingly complex test systems, NI DataFinder Server Edition ensures decreased strain on network resources, consistent search results among client machines, ease of installation and client configuration, and automatic integration with archiving systems and Windows user permission settings. A complement to DIAdem DataFinder, NI DataFinder Server Edition can help you further streamline large-scale simulation and test data management solutions with the NI TDM solution.

 

Back to Top

Bookmark & Share


Ratings

Rate this document

Answered Your Question?
Yes No

Submit