Configuring the Monitoring and Processing of Raw Data

Last Modified: November 20, 2020

A Data Preprocessor instance scans the raw data areas to detect new files, file changes, or deleted files. The processing rules you configure in »Manage»Processing Rules define, whether the Data Preprocessor instance also processes the data.

  1. In Data Preparation, click Data Preprocessor Instances.
  2. Select an instance and click »Manage»Monitoring Raw Data Areas.
  3. Configure the scanning process with the following options.
    Setting Description
    Changes in Raw Data Areas Scans and processes the data in the raw data areas when files are created, modified, or deleted.

    The operating system notifies SystemLink about every new, modified, or deleted file. Therefore you should not use this setting if you expect a very large amount of files or file changes.

    File Scan Schedule Creates a raw data scanning schedule.
    Continuous Scan Checks in regular intervals whether data has been added or changed in the raw data area.
    Timeout per File Maximum amount of time, in seconds, for processing a file.
    Number of Parallel Requests to the Computing Nodes Number of requests that the Data Preprocessor instance sends to the computing nodes for execution. The maximum value for this setting is 64. The default setting for a new Data Preprocessor instance is 4. Incrementing this number leads to more throughput and increases processing resource utilization on the machine.
    Index Adaptor Configures the adaptor. Adaptors connect the Data Preprocessor instance with a database, where they store the instance data. The connected databases may differ in performance and functionality.

    This setting only displays if the Data Preprocessor instance uses a different database than the standard database.

    Job Files Batch processing of specific files or folders.

