The SystemLink DataFrame Service stores and indexes tabular data using Dremio.

Dremio is a query engine. The DataFrame Service uses this engine to ingest data and execute queries against data tables. Dremio requires special attention when initially deploying to SystemLink and when DataFrame Service usage patterns increase.

Tainting Dremio Nodes for Resource Efficiency

Configure Dremio deployments to avoid node resource contention with other deployments in the cluster.

Dremio requires nodes with a minimum of 32 GB of RAM. For optimal query performance, allow 128 GB of RAM and at least 16 CPU cores for Dremio nodes.
  1. Apply a taint named dremio with a value of true and a NoSchedule effect.
    kubectl: kubectl taint nodes <your-node-name> dremio=true:NoSchedule
  2. Apply a label named dremio with a value of true.
    kubectl: kubectl label nodes <your-node-name> dremio=true
  3. To clear pods that Kubernetes already scheduled to this node, manually drain the node.
    kubectl: kubectl drain --ignore-daemonsets <your-node-name>
  4. Open systemlink-values.yaml.
  5. Configure dataframeservice.sldremio.zookeeper.count to the number of nodes with the dremio label.
  6. Configure dataframeservice.sldremio.nodeSelector to dremio: "true".
  7. Optional: Adjust the following parameters so that the tainted nodes can accommodate the pods.
    • dataframeservice.sldremio.coordinator.cpu
    • dataframeservice.sldremio.coordinator.memory
    • dataframeservice.sldremio.executor.cpu
    • dataframeservice.sldremio.executor.memory
    • dataframeservice.sldremio.executor.count
    Note Reducing resource requests and executor counts substantially from the defaults might diminish DataFrame Service query performance.
For information on how to configure SystemLink Enterprise to store your Dremio files, refer to Configuring File Storage.

Setting Concurrent Query Limits

Adjust the default query limits for Dremio.

SystemLink routes all requests to query data from a Dremio table to one of two processing queues.
  • A low-cost queue that processes small queries.
  • A high-cost queue that processes large and complex queries.
    Note This queue typically handles queries that scan one million or more rows.

Each queue enforces a limit on the number of queries that can execute in parallel. By default, the high-cost queue has a limit of 10 queries. The low-cost queue has a limit of 100 queries. After reaching these limits, the DataFrame Service holds excess requests until Dremio is able to process the queries. Exceeding the set limit results in higher query latency.

Note For an example of a Dremio configuration, refer to the Data Management Sizing Example on GitHub.
  1. Open the systemlink-values.yaml file.
  2. Adjust the high-cost queue limit by configuring the dataframeservice.queryEngine.workloadManagement.highCostUserQueriesQueue.concurrencyLimit value.
  3. Adjust the low-cost queue limit by configuring the dataframeservice.queryEngine.workloadManagement.lowCostUserQueriesQueue.concurrencyLimit value.
  4. Optional: Adjust the following parameters to ensure Dremio has the resources to support the new limits.
    • dataframeservice.sldremio.executor.cpu
    • dataframeservice.sldremio.executor.memory
    • dataframeservice.sldremio.executor.count
After setting the query limit, continue monitoring the resource usage and application performance with production loads. Use this information to determine the required resource allocations.