Performance Metrics for the Data Frame Service
- Updated2025-05-23
- 3 minute(s) read
Performance Metrics for the Data Frame Service
Monitor the health of the SystemLink Enterprise Data Frame Service using OpenTelemetry metrics and Prometheus metrics.
Refer to the following table of metrics emitted by the Data Frame Service and the Data Frame Service dependencies. You can deploy the OpenTelemetry collector and configure it to expose all OpenTelemetry metrics as Prometheus metrics. Then, you can view the Prometheus metrics in a tool such as Grafana.
Data Frame Service
KPI? | Metric | Type | Description | Labels |
---|---|---|---|---|
Yes | ni.dataframe.staged_row_data_processor.staging.files.found.count | Counter | The number of staging files detected in storage Use with ni.dataframe.staged_row_data_processor.staging.files.orphaned.count to understand if the service is falling behind in processing files. |
None |
Yes | ni.dataframe.staged_row_data_processor.staging.files.orphaned.count | Counter | The number of staging files deleted as orphans Use with
ni.dataframe.staged_row_data_processor.staging.files.found.count
to understand if the service is falling behind in processing
files. In an ideal operation, this metric is zero. One of
the following situations can cause a value greater than X.
|
None |
Yes | ni.dataframe.staged_row_data_processor.staging.files.missing.count | Counter | The number of staging files missing This metric indicates
one of the following issues.
|
None |
Yes | ni.dataframe.staged_row_data_processor.claims.lost.count | Counter | The number of claims lost during processing This metric
indicates one of the following issues.
|
None |
Yes | ni.dataframe.staged_row_data_processor.claims.with.errors.count | Counter | The number of claims that encountered errors during
processing Treat values greater than zero as the service returning 500 errors. |
ni_dataframe_staged_row_data_processor_phase: [1, 2] |
No | ni.dataframe.staged_row_data_processor.skipped.storage.ids.count | Counter | The number of discovered storage IDs that were not processed | None |
No | ni.dataframe.staged_row_data_processor.failed.to.claim.count | Counter | The number of discovered storage IDs that were not claimed | None |
No | ni.dataframe.staged_row_data_processor.claims.processed.count | Counter | The number of claims processed | ni_dataframe_staged_row_data_processor_phase: [1, 2] |
No | ni.dataframe.staged_row_data_processor.sent.notifications.count | Counter | The number of notifications sent | None |
No | ni.dataframe.row_data_store.s3_stream_pool.blocks.count | Counter | The number of free blocks in the S3 stream pool | None |
No | ni.dataframe.row_data_store.s3_stream_pool.allocations.count | Counter | The number of blocks allocated in the S3 stream pool | None |
No | ni.dataframe.row_data_store.s3_stream_pool.discards.count | Counter | The number of buffers discarded from the S3 stream pool | None |
No | ni.dataframe.row_data_store.s3_stream_pool.free.size.bytes | Counter | The number of bytes allocated but unused in the S3 stream pool | None |
No | ni.dataframe.row_data_store.s3_stream_pool.used.size.bytes | Counter | The number of bytes currently in use by the S3 stream pool | None |
Yes | ni.dataframe.table_reaper.tables.reaped.count | Counter | The number of tables deleted Use this metric to monitor the clean up of tables. |
ni_dataframe_table_reaper_reaped_result: [deleted, skipped, failed] |
Yes | ni.dataframe.tables.appendable.count | Gauge | The number of active tables that can be appended Use this metric to compare the number of tables that can be appended to the limit for tables that can be appended. |
None |
Yes | ni.dataframe.iceberg_operations.duration | Histogram | The duration of Iceberg operations. |
|
DataFrame Service Dependencies
Dependency | Where to Find Information |
---|---|
ASP.NET | For a list of ASP.NET metrics, refer to ASP.NET Core Metrics and ASP.NET Runtime Metrics. |
Kubernetes | For a list of Kubernetes metrics, refer to Kubernetes Metrics Reference, cAdvisor Metrics, and the kube-state-metrics Documentation. |
Dremio | For a list of Dremio metrics, refer to Available JMX Metrics. |
Related Information
- OpenTelemetry Website
- Prometheus Website
- cAdvisor Metrics
- Performance Metrics for the Alarm Service
Monitor the health of the SystemLink Enterprise Alarm Service using OpenTelemetry metrics and Prometheus metrics.
- ASP.NET Core Metrics
- ASP.NET Runtime Metrics
- Kubernetes Metrics Reference
- kube-state-metrics Documentation
- Available JMX Metrics