Several SystemLink Enterprise services require an object storage provider.

SystemLink Enterprise supports the following storage providers:
  • Amazon S3 Storage
  • Amazon S3 Compatible Storage
  • Azure Blob Storage
Note An Amazon S3 compatible object storage provider must implement the full Amazon S3 API. For more information, refer to the Amazon S3 API Reference. The data frame service does not support the GCS Amazon S3 interoperable XML API.

You can share the parameters in the following tables for Amazon S3 storage and Azure Blob storage across multiple configurations. Sharing occurs through YAML anchor syntax in the Helm values files. This syntax provides a convenient way to share a common configuration throughout your values files. You can override individual references to these values with custom values.

Amazon S3 and Amazon S3 Compatible Storage Providers

Note You can encrypt objects in Amazon S3 storage using either SSE-S3 or SSE-KMS with a bucket key. For more information, refer to Protecting Amazon S3 Data with Encryption.

Set the following configuration in your aws-supplemental-values.yaml Helm configuration file or storage-values.yaml Helm configuration file.

You can configure secret references in the aws-secrets.yaml file, the storage-secrete.yaml file, or directly on the cluster.

Table 33. Configurable Parameters
Parameters Before the 2025-07 Release Parameters After the 2025-07 Release Details
Not applicable
  • dataframeservice.storage.type
  • fileingestion.storage.type
  • feedservice.storage.type
  • nbexecservice.storage.type
This value represents the service storage type. Set the value to s3.
  • dataframeservice.s3.port
  • fileingestion.s3.port
  • feedservice.s3.port
  • nbexecservice.s3.port
  • dataframeservice.storage.s3.port
  • fileingestion.storage.s3.port
  • feedservice.storage.s3.port
  • nbexecservice.storage.s3.port
This value represents the storage provider service port number.
  • dataframeservice.s3.host
  • fileingestion.s3.host
  • feedservice.s3.host
  • nbexecservice.s3.host
  • dataframeservice.storage.s3.host
  • fileingestion.storage.s3.host
  • feedservice.storage.s3.host
  • nbexecservice.storage.s3.host
This value represents the hostname of the storage provider service.
  • dataframeservice.s3.schemeName
  • fileingestion.s3.scheme
  • feedservice.s3.scheme
  • nbexecservice.s3.scheme
  • dataframeservice.storage.s3.schemeName
  • fileingestion.storage.s3.scheme
  • feedservice.storage.s3.scheme
  • nbexecservice.storage.s3.scheme
This value represents the scheme of the storage provider service. This value is typically https.
  • dataframeservice.s3.region
  • fileingestion.s3.region
  • feedservice.s3.region
  • nbexecservice.s3.region
  • dataframeservice.storage.s3.region
  • fileingestion.storage.s3.region
  • feedservice.storage.s3.region
  • nbexecservice.storage.s3.region
This value represents the AWS region the S3 bucket is located.
  • dataframeservice.sldremio.distStorage
Unchanged

Resolve the <ATTENTION> flags.

These settings configure the distributed storage that is required for the DataFrame Service.

  • dataframeservice.storage.s3.auth.secretName
  • fileingestion.storage.s3.secretName
  • feedservice.storage.s3.secretName
  • nbexecservice.storage.s3.secretName
Unchanged Secret name for credentials used to connect to the storage provider service.

Beginning with the 2025-11 release, fileingestioncdc adds the following parameters.

Table 34. 2025-11 Release Parameters
Parameter Details
fileingestioncdc.highAvailability.storage.s3.port This value represents the port number of the storage provider service.
fileingestioncdc.highAvailability.storage.s3.scheme This value represents the scheme of the storage provider service. This value is typically https.

Connecting Services to S3 through IAM

Assign an IAM role to connect services to Amazon S3. Configure service accounts and IAM role annotations in your Helm values file.

Note Beginning with the 2026-03 release, SystemLink has deprecated AWS_WEB_IDENTITY_TOKEN. SystemLink retained the functions of the token as an alias of AWS_IAM. Use AWS_IAM as the S3 authentication type for IAM role-based authentication.
  • Create a service account for each service by setting serviceAccount.create: true in your Helm values.
    Note Flink services do not require this configuration. The Flink Operator manages the service account.
  • Create an IAM policy with the following statement:
    "Action": [
      "s3:PutObject",
      "s3:ListBucket",
      "s3:GetObject",
      "s3:DeleteObject",
      "s3:AbortMultipartUpload"
    ],
    "Effect": "Allow",
    "Resource": [
      "<s3_bucket_ARN>/*",
      "<s3_bucket_ARN>"
    ]
    Note The <s3_bucket_ARN> placeholder represents the Amazon Resource Name for the S3 bucket of the service.
  • Create an IAM role that applies the IAM policy.
    Note Most IAM roles use the following naming convention: <release-name>-<service-name>-role. For example, systemlink-feedservice-role. Flink services share the same configuration as the Flink Operator and use: <release-name>-flink-role.
Note Only include the following service account annotations when using IRSA. Pod identity does not require these annotations.
Table 35. Service Configurations
Service Configuration
Asset Service CDC
assetservicecdc:
  highAvailability:
    storage:
      s3:
        authType: "AWS_IAM"
flinkoperator:
  flink-kubernetes-operator:
    jobServiceAccount:
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-flink-role"
DataFrame Service
dataframeservice:
  storage:
    s3:
      authType: "AWS_IAM"
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-dataframeservice-role"
dataframeservice:
  sldremio:
    distStorage:
      aws:
        authentication: "metadata"
Note Beginning with the 2026-05 release, Dremio supports EKS pod identity for distributed storage. To use pod identity, set sldremio.distStorage.aws.authentication to podIdentity instead of metadata.
dataframeservice:
  sldremio:
    storage:
      s3:
        authType: "EC2_METADATA"
        roleArn: "arn:aws:iam::<account-id>:role/<release-name>-dataframeservice-role"

For additional Dremio IAM configuration steps, refer to Configuring Dremio Authentication for S3.

Feed Service
feedservice:
  storage:
    s3:
      authType: "AWS_IAM"
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-feedservice-role"
File Ingestion Service
fileingestion:
  storage:
    s3:
      authType: "AWS_IAM"
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-fileingestion-role"
File Ingestion CDC
fileingestioncdc:
  highAvailability:
    storage:
      s3:
        authType: "AWS_IAM"
flinkoperator:
    flink-kubernetes-operator:
      jobServiceAccount:
        annotations:
          eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-flink-role"
Notebook Execution Service
nbexecservice:
  storage:
    s3:
      authType: "AWS_IAM"
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-executions-role"
System Service CDC
systemscdc:
  highAvailability:
    storage:
      s3:
        authType: "AWS_IAM"
flinkoperator:
  flink-kubernetes-operator:
    jobServiceAccount:
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-flink-role"

Configuring Dremio Authentication for S3

Dremio requires authentication for two S3 storage paths.

  • Data storage โ€” The DataFrame Service S3 bucket that stores table data. Dremio accesses this bucket through EC2 instance metadata by assuming the DataFrame Service IAM role.
  • Distributed storage โ€” A separate S3 bucket used by Dremio for query coordination and caching. Beginning with the 2026-05 release, NI recommends authenticating distributed storage through the EKS pod identity. For earlier releases, or when pod identity is not available, use EC2 metadata authentication.
Note NI recommends using separate S3 buckets for both data storage and distributed storage. NI recomends narrowly scoping access for each IAM role.
Note When changing the value of dataframeservice.sldremio.storage.s3.authType, you must reset Dremio after applying the update. This requirement does not apply to first-time installations. For more information, refer to Resetting Dremio.

Complete the following steps to configure Dremio authentication.

  1. Ensure that the trust policy for the DataFrame Service IAM role allows both of the following principals.
    • The EKS pod identity agent (pods.eks.amazonaws.com) must have the sts:AssumeRole permission and the sts:TagSession permission.
    • The EC2 node group role that runs Dremio pods must have the sts:AssumeRole permission.
    Note The DataFrame Service role requires the EC2 node group role as a trusted principal. Dremio accesses the DataFrame Service S3 bucket by assuming this role through EC2 metadata authentication. If you created the DataFrame Service role following Connecting Services to S3 through IAM, update the trust policy to include the EC2 node group role ARN.
  2. Grant the IAM role for the EC2 node group that runs Dremio pods permission to assume the DataFrame Service IAM role. To grant this permission, add the following statement to the role policy for the node group.
    {
      "Action": [
        "sts:AssumeRole"
      ],
      "Effect": "Allow",
      "Resource": "<dataframeservice_role_ARN>"
    }
  3. Set dataframeservice.storage.s3.authType to AWS_IAM.
  4. Set dataframeservice.sldremio.storage.s3.authType to EC2_METADATA.
  5. Set dataframeservice.sldremio.storage.s3.roleArn to the DataFrame Service IAM role ARN.
  6. Configure the distributed storage authentication using one of the following options.
    Table 36. Options for Configuring Distributed Storage Authentication
    Option Steps
    EKS Pod Identity
    Note NI recommends this option
    1. Create a Dremio distributed storage IAM role.
      1. Create an IAM policy with the following statements.

        Add a statement to allow global S3 bucket discovery.

        {
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListAllMyBuckets"
          ],
          "Effect": "Allow",
          "Resource": "arn:aws:s3:::*"
        }

        Add a statement to allow access to the distributed storage bucket.

        {
          "Action": [
            "s3:ListBucket",
            "s3:ListMultipartUploadParts",
            "s3:ListBucketMultipartUploads",
            "s3:AbortMultipartUpload",
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
          ],
          "Effect": "Allow",
          "Resource": [
            "<distributed_storage_bucket_ARN>",
            "<distributed_storage_bucket_ARN>/*"
          ]
        }
      2. Create the IAM role with a trust policy that allows the EKS pod identity agent (pods.eks.amazonaws.com) with sts:AssumeRole and sts:TagSession permissions.
      3. Attach the IAM policy to the role.
    2. Create EKS pod identity associations between the Dremio distributed storage role, the dremio-coordinator service account, and the dremio-executor service account.
    3. Set dataframeservice.sldremio.distStorage.aws.authentication to podIdentity.
    EC2 Metadata
    Note Use this option when pod identity is not available
    1. Add a statement to the role policy for the EC2 node group to allow global S3 bucket discovery.
      {
        "Action": [
          "s3:GetBucketLocation",
          "s3:ListAllMyBuckets"
        ],
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::*"
      }
    2. Add a statement to the EC2 node group role policy to allow access to the distributed storage bucket.

      {
        "Action": [
          "s3:ListBucket",
          "s3:ListMultipartUploadParts",
          "s3:ListBucketMultipartUploads",
          "s3:AbortMultipartUpload",
          "s3:PutObject",
          "s3:GetObject",
          "s3:DeleteObject"
        ],
        "Effect": "Allow",
        "Resource": [
          "<distributed_storage_bucket_ARN>",
          "<distributed_storage_bucket_ARN>/*"
        ]
      }
      Note If the DataFrame Service S3 bucket and Dremio distributed storage bucket are the same, use the same bucket ARN in the following areas.
      • The role policy of the DataFrame Service.
      • The previous statements.
    3. Set dataframeservice.sldremio.distStorage.aws.authentication to metadata.
  7. If your S3 buckets use SSE-KMS encryption, ensure that the following roles have kms:GenerateDataKey and kms:Decrypt permissions on the KMS key.
    • DataFrame Service IAM role
    • Dremio distributed storage role
      Note If your distributed storage authentication uses EC2 Metadata, set the EC2 node group role.
  8. Remove the Dremio S3 access key configuration from the secrets when using IAM-based authentication.

For security reasons, NI recommends running Dremio on a dedicated node group.

Note If Dremio shares a node group with other workloads. Those workloads can access the same EC2 instance profile credentials through node metadata. Use a dedicated node group to limit credential access to Dremio workloads.

To use EC2 metadata authentication from pods, you must have an Instance Metadata Service (IMDS) hop limit greater than 1. A hop limit greater than or equal to 2 allows pods to retrieve credentials in addition to the node.

Note This setting is relevant on Amazon EKS nodes that use AL2023, where the default hop limit can prevent pods from accessing IMDS credentials. For more information, refer to IMDS Access Considerations.

Connecting Services to S3 using Access Keys

Connect your services to S3 through access keys in the following scenarios.

Note NI only recommends using access key authentication when your system requires static credentials.
  • S3-compatible storage providers โ€” Only Amazon S3 supports IAM role-based authentication. For other S3 compatible providers, use access key authentication.
  • AWS deployments without an IAM role configuration โ€” If Pod Identity and IRSA are not available, you can use access key authentication.
Table 37. File Configurations for Access Keys
Configuration Description
Values file

In your systemlink-values.yaml or aws-supplemental-values.yaml file, specify the S3 connection parameters and secret reference.

feedservice:
  storage:
    s3:
      secretName: "feeds-s3-credentials"
      accessKeyIdName: "aws-access-key-id"
      accessKeyName: "aws-secret-access-key"
      authType: "ACCESS_KEY"
      bucket: "systemlink-feeds"
      scheme: "https://"
      host: "s3.amazonaws.com"
      port: 443
      region: "us-east-1"
Secrets file

In your systemlink-secrets.yaml or aws-secrets.yaml file, provide the access credentials.

feedservice:
  secrets:
    s3:
      accessKeyId: "<your-access-key-id>"
      accessKey: "<your-secret-access-key>"

The same pattern applies to other services when IAM authentication is not available.

Note When deploying on AWS with Amazon S3, NI recommends using IAM authentication where supported for improved security and credential management.

Azure Blob Storage Providers

Note For the storage account of the Data Frame service, you must disable blob soft delete and hierarchical namespace.

Set the following configuration in your azure-supplemental-values.yaml Helm configuration file or storage-values.yaml Helm configuration file.

You can configure secret references in the azure-secrets.yaml file, the storage-secrets.yaml file, or directly on the cluster.

Table 38. Configurable Parameters
Parameters Starting with the 2025-07 Release Details
  • dataframeservice.storage.type
  • fileingestion.storage.type
  • fileingestioncdc.highAvailability.storage.type
  • feedservice.storage.type
  • nbexecservice.storage.type

This value represents the storage type of the service. Set the value to azure.

  • dataframeservice.storage.azure.blobApiHost
  • fileingestion.storage.azure.blobApiHost
  • fileingestioncdc.highAvailability.storage.azure.blobApiHost
  • feedservice.storage.azure.blobApiHost
  • nbexecservice.storage.azure.blobApiHost

This value represents the host of the Azure Blob storage without the account name. For example, you can set the value to blob.core.windows.net or blob.core.usgovcloudapi.net.

If your storage does not use the default port, add the port to the end of the host. For example, blob.core.windows.net:1234.

  • dataframeservice.storage.azure.dataLakeApiHost

This value represents the host and the port of the Azure Data Lake Storage to connect to without the account name. For example, you can set the value to dfs.core.windows.net.

If your storage does not use the default port, add the port to the end of the host. For example: dfs.core.windows.net:1234.

  • dataframeservice.storage.azure.accountName
  • fileingestion.storage.azure.accountName
  • fileingestioncdc.highAvailability.storage.azure.accountName
  • feedservice.storage.azure.accountName
  • nbexecservice.storage.azure.accountName

This value represents the storage account for your service. NI recommends using different storage accounts for different services.

Connecting Services to Azure Blob Storage

To configure Azure Blob Storage authentication, you must configure both the values file and the secrets file.

Configure Azure Blob Storage authentication using the values file and secrets file.

In your azure-supplemental-values.yaml or storage-values.yaml file, specify the Azure storage parameters.

feedservice:
  storage:
    type: "azure"
    azure:
      accountName: "<your-azure-storage-account-name>"
      blobApiHost: "blob.core.windows.net"

In your azure-secrets.yaml file, provide the access credentials.

feedservice:
  secrets:
    azure:
      accessKey: "<your-azure-storage-access-key>"

The same pattern applies to other services.

Limits and Cost Considerations for File Storage

To adjust limits and cost considerations for file storage services, refer to the following configurations.

Table 39. File Storage Considerations
Consideration Configuration
Reducing storage costs.
To clean up incomplete multipart uploads, configure your service. If you are using Amazon S3, configure the AbortIncompleteMultipartUpload value on your S3 buckets.
Note Azure storage automatically deletes uncommitted blocks after seven days. For other S3 compatible providers, refer to the provider documentation.
Adjusting the number of files a single user can upload per second.

Configure the fileingestion.rateLimits.upload value.

By default, the value is 3 files per second per user. By load balancing across replicas, the effective rate is higher than the specified rate.

Adjusting the maximum file size that users can upload.

Configure the fileingestion.uploadLimitGB value.

By default, the value is 2 GB.

Adjusting the number of concurrent requests that a single replica can serve for ingesting data.

Configure the dataframeservice.rateLimits.ingestion.requestLimit value.