DataFrame Service Metrics
- Updated2025-11-26
- 3 minute(s) read
Monitor the health of the SystemLink Enterprise DataFrame Service using OpenTelemetry metrics and Prometheus metrics.
For the metrics that contain ni.dataframe.row_data_store.{object_storage}_stream_pool, the service replaces {object_storage} with s3 or azure. The replacement is dependent on the object storage provider that the service is connected to. The service implements the replacement automatically when emitting the metrics.
DataFrame Service
| KPI? | Metric | Type | Description | Labels |
|---|---|---|---|---|
| Yes | ni.dataframe.staged_row_data_processor.staging.files.found.count | Counter | The number of staging files found in storage. Use with ni.dataframe.staged_row_data_processor.staging.files.orphaned.count to understand if the service is falling behind in processing files. |
None |
| Yes | ni.dataframe.staged_row_data_processor.staging.files.orphaned.count | Counter | The number of staging files deleted as orphans. Use with
ni.dataframe.staged_row_data_processor.staging.files.found.count
to understand if the service is falling behind in processing
files. In an ideal operation, this metric is zero. One of
the following situations can cause a value greater than X.
|
None |
| Yes | ni.dataframe.staged_row_data_processor.staging.files.missing.count | Counter | The number of staging files that are missing. This metric indicates one of the
following issues.
|
None |
| Yes | ni.dataframe.staged_row_data_processor.claims.lost.count | Counter | The number of claims lost during processing. This metric indicates one of the
following issues.
|
None |
| Yes | ni.dataframe.staged_row_data_processor.claims.with.errors.count | Counter | The number of claims that encountered errors during processing. Treat values greater than zero as the service returning 500 errors. |
ni_dataframe_staged_row_data_processor_phase: [1, 2] |
| No | ni.dataframe.staged_row_data_processor.skipped.storage.ids.count | Counter | The number of discovered storage IDs that did not process. | None |
| No | ni.dataframe.staged_row_data_processor.failed.to.claim.count | Counter | The number of discovered storage IDs without a claim. | None |
| No | ni.dataframe.staged_row_data_processor.claims.processed.count | Counter | The number of claims processed. | ni_dataframe_staged_row_data_processor_phase: [1, 2] |
| No | ni.dataframe.staged_row_data_processor.sent.notifications.count | Counter | The number of notifications sent. | None |
| No | ni.dataframe.row_data_store.{object_storage}_stream_pool.blocks.count | Counter | The number of free blocks in the stream pool for object storage. | None |
| No | ni.dataframe.row_data_store.{object_storage}_stream_pool.allocations.count | Counter | The number of blocks allocated in the stream pool for object storage. | None |
| No | ni.dataframe.row_data_store.{object_storage}_stream_pool.discards.count | Counter | The number of buffers discarded from the stream pool for object storage. | None |
| No | ni.dataframe.row_data_store.{object_storage}_stream_pool.free.size.bytes | Counter | The number of bytes allocated but unused in the stream pool for object storage. | None |
| No | ni.dataframe.row_data_store.{object_storage}_stream_pool.used.size.bytes | Counter | The number of bytes currently in use by the stream pool for object storage. | None |
| Yes | ni.dataframe.table_reaper.tables.reaped.count | Counter | The number of tables deleted. Use this metric to monitor the clean up of tables. |
ni_dataframe_table_reaper_reaped_result: [deleted, skipped, failed] |
| Yes | ni.dataframe.tables.count | Gauge | The total number of data tables. Use this metric to monitor data table growth. MongoDB resource requirements increase with the number of data tables. |
None |
| Yes | ni.dataframe.tables.appendable.count | Gauge | The number of active tables that can be appended. Use this metric to compare the number of appendable tables to the appendable table limit. |
None |
| Yes | ni.dataframe.iceberg_operations.duration | Histogram | The duration of Iceberg operations. |
|
DataFrame Service Dependencies
| Dependency | Where to Find Information |
|---|---|
| ASP.NET | For a list of ASP.NET metrics, refer to ASP.NET Core Metrics and ASP.NET Runtime Metrics. |
| Kubernetes | For a list of Kubernetes metrics, refer to Kubernetes Metrics Reference, cAdvisor Metrics, and the kube-state-metrics Documentation. |
| Dremio | For a list of Dremio metrics, refer to Available JMX Metrics. |
Related Information
- OpenTelemetry Website
- Prometheus Website
- cAdvisor Metrics
- Alarm Service Metrics
Monitor the health of the SystemLink Enterprise Alarm Service using OpenTelemetry metrics and Prometheus metrics.
- ASP.NET Core Metrics
- ASP.NET Runtime Metrics
- Kubernetes Metrics Reference
- kube-state-metrics Documentation
- Available JMX Metrics