The SystemLink DataFrame Service stores and indexes tabular data using Dremio.

Dremio is a query engine for ingesting data and executing queries against data tables. This engine requires special attention for initial deployment and when DataFrame Service usage patterns increase.

Node Resource Requirements

Dremio requires dedicated nodes with sufficient resources to avoid contention with other cluster deployments.

Minimum requirements 32 GB RAM per node
Optimal performance
  • 128 GB RAM per node
  • At least 16 CPU cores per node

Node Tainting and Labeling

NI recommends running Dremio resources on dedicated nodes to ensure optimal performance and optimal resource isolation.

Configure node taints and labels to dedicate specific nodes to Dremio workloads. For each of your designated Dremio nodes, apply the following Kubernetes configurations.

Configuration Type Command Purpose
Node Taint kubectl taint nodes <your-node-name> dremio=true:NoSchedule Prevents other pods from scheduling on Dremio nodes.
Node Label kubectl label nodes <your-node-name> dremio=true Identifies the nodes for Dremio pod placement.
Node Drain kubectl drain --ignore-daemonsets <your-node-name> Removes existing pods from the designated node.

Dremio Helm Configuration Parameters

For a Dremio deployment, configure the following parameters in your systemlink-values.yaml file.

Parameter Details Required
dataframeservice.sldremio.zookeeper.count Set to the number of nodes with the dremio label. Yes
dataframeservice.sldremio.nodeSelector Set to dremio: "true" to schedule pods on labeled nodes. Yes

Resource Allocation Parameters

To optimize resource allocation for tainted nodes, adjust the following optional parameters.

Parameter Component Details
dataframeservice.sldremio.coordinator.resources.requests.cpu Coordinator The CPU resource requests for the coordinator component.
dataframeservice.sldremio.coordinator.resources.requests.memory Coordinator The memory resource requests for the coordinator component.
dataframeservice.sldremio.coordinator.resources.limits.memory Coordinator The memory resource limits for the coordinator component.
dataframeservice.sldremio.executor.count Executor The number of executor instances.
  • dataframeservice.sldremio.executor.resources.requests.cpu
  • dataframeservice.sldremio.executor.resources.requests.memory
  • dataframeservice.sldremio.executor.resources.limits.memory
Executor The resource allocation for executor instances.
dataframeservice.sldremio.executor.engineOverride.iceberg.count Iceberg Engine The number of Iceberg engine instances.
dataframeservice.sldremio.executor.engineOverride.iceberg.resources.requests.cpu Iceberg Engine The CPU resource requests for Iceberg engine instances.
dataframeservice.sldremio.executor.engineOverride.iceberg.resources.requests.memory Iceberg Engine The memory resource requests for Iceberg engine instances.
dataframeservice.sldremio.executor.engineOverride.iceberg.resources.limits.memory Iceberg Engine The memory resource limits for Iceberg engine instances.
Note NI does not recommend substantially reducing resource requests and executor counts from the defaults. Either of these actions might diminish the performance of DataFrame Service queries or cause errors.

Query Processing Queues

SystemLink routes data queries to two processing queues that are based on query complexity.

  • The low-cost queue processes small queries. The default limit 100 concurrent queries.
  • The high-cost queue processes large and complex queries. This queue typically scans one million or more rows. The default limit is 10 concurrent queries.
Note When a queue exceeds a default limit, the DataFrame Service withholds the requests from Dremio. The DataFrame Service only resumes sending requests when Dremio can process the requests. This method results in higher query latency.

Concurrent Query Limit Parameters

To adjust query limits, configure the following parameters in your systemlink-values.yaml file.

Parameter Queue Type Default Value Details
dataframeservice.queryEngine.workloadManagement.highCostUserQueriesQueue.concurrencyLimit High-cost 10 Sets the maximum number of concurrent queries for large, complex operations.
dataframeservice.queryEngine.workloadManagement.lowCostUserQueriesQueue.concurrencyLimit Low-cost 100 Sets the maximum number of concurrent queries for small operations.

Supporting Resource Parameters

When adjusting query limits, ensure Dremio has the adequate resources by configuring the following parameters.

Parameter Details
dataframeservice.sldremio.executor.count The number of executor instances that handle an increased query load.
dataframeservice.sldremio.executor.resources.requests.cpu The CPU resource requests for executor instances.
dataframeservice.sldremio.executor.resources.requests.memory The memory resource requests for executor instances.
dataframeservice.sldremio.executor.resources.limits.memory The memory resource limits for executor instances.
Note Monitor resource usage and application performance with production loads to determine optimal resource allocations.