NI DataFinder Server Edition - Performance Guide

Publish Date: Oct 18, 2013 | 0 Ratings | 0.00 out of 5 | Print

Overview

DataFinder Server Edition enables users to quickly create a data mining back-end without requiring any database, server or network technology know-how. However, even if the setup process is quite easy, the boundary conditions predefined by the technology employed will determine the resulting performance of a DataFinder server. This document discusses the influence of IT systems on performance, and provides assistance on how to set up a DataFinder with optimal performance with respect to the given boundary conditions.

Table of Contents

  1. DataFinder Indexing Performance
  2. DataFinder Query Performance
  3. DataFinder Index Performance
  4. Performance Summary

1. DataFinder Indexing Performance

DataFinder Server Edition is built to index files contained in defined search areas. Before you can start mining the content of a DataFinder server, these files must be indexed. The indexing speed is determined by:

  • Network performance to access these files
  • DataPlugin(s) used to index files
  • Number of items per file
  • Number of “optimized” custom properties
  • Indexing strategy

 

Cause 1.1: Network performance

A DataFinder server indexes files which are defined by one or more search areas, while a search area specifies a given network share or a part of it. The network performance between that file server and the DataFinder server has a direct influence on the indexing speed. The faster a file can be accessed by DataFinder, the faster it can be indexed.

Please note that even if the network speed between these two servers has a significant influence on indexing, it is not advisable to directly install DataFinder on a file server. The traffic caused by other users, processes, and applications will significantly decrease the overall DataFinder performance.

 

Cause 1.2: DataPlugin

DataFinder uses DataPlugins to index files. The time it takes for a DataPlugin to read a file will directly influence the indexing speed. Since DataFinder only indexes the meta data (properties) of a file, the DataPlugin should be created to ensure that the bulk or channel data section of the file can be skipped during indexing. When developing your own DataPlugin, the following recommendations should be applied to the DataPlugin script.

  • Use GetBinaryBlock, GetFixedWidthBlock, GetStringBlock, or GetCellBlock to define a DirectAccessChannel of the bulk or channel data section of the file. Please refer to DIAdem help to learn more about these commands.
  • In case the methods listed above cannot be used given the nature of the file, create the DataPlugin by configuring the DataPlugin Settings to include DataFinder parameter. You can set this up in the DataFinder Server Manager by going to Settings » Global Options » DataPlugins. Select which DataPlugin you are modifying, expand the dialog box, and select DataFinder parameters. Then edit the DataPlugin script to exclude the bulk or channel data section of the file with the statement “if not DataFinderParameter then”

Figure 1. Select the DataFinder parameter to make DataPlugins execute more efficiently. 

 

Please note that every improvement of the DataPlugin performance will have a significant influence on the overall system performance, including indexing files and loading data.

 

Cause 1.3: Number of items per file

Since DataFinder only indexes the metadata of a file, the amount of metadata has a significant influence on the time it takes to index a file.  The overall performance depends on the number of elements (channel groups, channels) and their according custom properties contained in a particular file. This means that a file with many channels, containing a lot of bulk data but only few properties on the group level, will index faster than a file with only a few channels, but where each channel contains hundreds of custom properties.

 

Cause 1.4: Number of “optimized” custom properties

Not only the number of elements and custom properties per file influence the index performance, but also whether specific custom properties are “optimized”. Optimizing custom properties results in a better query performance based on the conditions containing these properties. The flipside of query performance improvement is a slow-down in the indexing speed. Therefore, optimizing custom properties should only be considered when the query performance is low (or when it is necessary to enumerate possible values of these specific custom properties).

A good approach would be to optimize custom properties after the first complete indexing of the DataFinder search areas.

 

Cause 1.5: Indexing strategy

The appropriate indexing strategy should be chosen to match the number of files being indexed at a given time and according to whether those indexing changes will be queried immediately. You can set the indexing trigger in the Configuration Dialog box in the Indexer tab.

Figure 2. Set up how DIAdem automatically index files in the Configuration dialog box. 

 

The different options for automatically indexing the DataFinder are: 

    • Index on data file changes
      Choose this option to let DataFinder automatically index files once they are displayed in a search area. This method is based on a Windows notification service which may not reliably trigger the indexer if, for example, many changes have occurred in the observed search areas or if other listeners have subscribed to this service.
    • Index files/folders defined in a job file
      Choose this option in case the files contained in a search area are generated by an automated process that can also create the required job file. This method reliably triggers indexing files or whole folders. Even when DataFinder is not running while files and according job files are created. It also helps to trigger the indexing process at the very moment all files are successfully copied into the search area.
    • No automatic indexing
      Turn off automatic indexing if the indexing process is triggered by a DIAdem script or a LabVIEW DataFinder Toolkit-based application running on that server.
    • Scheduled indexing
      Run scheduled indexing in the background to guarantee that the index is updated from time to time regardless of the chosen indexing algorithm. It is also possible to schedule the indexing process individually per search area.

For a more detailed look at each of the triggering options, you can read more in the whitepaper NI DataFinder Server Edition – Choosing DataFinder Indexing Procedures.

 

Back to Top

2. DataFinder Query Performance

As is the case with the indexing speed, the query performance is also influenced by several factors, such as:

  • Complexity of query
  • Optimized custom properties

 

Cause 2.1: Complexity of query

The complexity of a query is mainly determined by

  • Number of objects to return
    A query that returns a large number of results will execute slower because time is required to send the query results and build the according result objects (files, channel groups or channels) on the client side.
  • Included hierarchy levels
    Using conditions with different hierarchy levels within a single query may decrease the query speed. This means that a query for channels with a specific channel name and channel unit might perform better than a query for channels belonging to a group with a specific name and the description of a specific file with a given extension.
  • Wildcards at the beginning of a string
    Using wildcards at the beginning of a condition comparison value, for instance channel.name = *temp, will defeat all index optimization methods and decrease the query speed.
  • Order By
    Applying  “Order by” to a condition will force the determination of the whole result set before reducing it to the requested number of results.  Additionally the sorting algorithm will be executed. Both causing a noticeable decrease of the query speed.

 

Cause 2.2: Optimized custom properties

If custom properties are used in a query condition, it is a good idea to optimize these specific custom properties to increase query performance (and allow enumerating possible values of these specific custom properties).

As a flipside, optimized custom properties will decrease indexing performance and increase the overall index size, so the number of optimized custom properties should be chosen with moderation.

 

Back to Top

3. DataFinder Index Performance

The overall DataFinder index performance is determined by the physical and process boundary conditions of the DataFinder environment, such as:

  • File server volatility
  • Index optimization
  • Hard disk
  • Multicore
  • RAM

 

Cause 3.1: File server volatility

Moving and deleting files is the worst kind of operation to be performed on a file server because it will cause massive reorganization of the DataFinder index and will result in poor indexing and query speed for the overall system. This reorganization will start immediately if indexing is defined to take place on file changes, and it will probably go unnoticed by the person or process moving or deleting files.

 

Cause 3.2: Index optimization

Depending on the ongoing indexing, the query performance will degrade over time as the index gradually fragments.

Optimizing the index from time to time will bring back the original query performance. This is why DataFinder has a default function to schedule this optimization process.

Figure 3. By default DataFinder is set to optimize the index every eight weeks, but you can configure the index optimization to meet you needs in this dialog box. 

You can choose to manually optimize the index by selecting your DataFinder and in the menu bar go to DataFinder » Optimize Index. To set up a recurring index optimization, go to Settings » Schedule Index Optimization… and select how often and what time you prefer the optimization to occur.

 

Cause 3.3: Hard disk

The access speed to the DataFinder index is directly linked to the hard disk performance on which the index is stored. Consider a dedicated hard disk for the DataFinder index with a very good access speed, such as an SSD.

 

Cause 3.4: Multicore

DataFinder is a massive parallel program using several parallel threads and parallel processes. Therefore, a multicore system is recommended to increase the overall performance.

 

Cause 3.5: RAM

Given the fact that DataFinder uses several parallel processes and also tries to cache as much of its index in memory as possible, a very good memory installation (RAM) should be considered as well.

 

Back to Top

4. Performance Summary

As discussed within this document, the overall DataFinder performance is determined by several parameters. A reasonable server computer and a well-designed file storing process, especially for huge file servers, will help to ensure that the self-contained and optimized DataFinder server will fulfill the desired performance requirements.

 

Additional Resources

Learn more about NI DataFinder Server Edition

Read more details about index optimization in DataFinder

Back to Top

Bookmark & Share


Ratings

Rate this document

Answered Your Question?
Yes No

Submit