Configuring File Storage

Download PDF

Updated2025-12-18
5 minute(s) read

Several SystemLink Enterprise services require a file storage provider.

The following list contains the supported providers:

Amazon S3 Storage
Amazon S3 Compatible Storage
Azure Blob Storage

Note An Amazon S3 compatible file storage provider must implement the full Amazon S3 API. For more information, refer to the Amazon S3 API Reference. The Data Frame Service does not support the GCS Amazon S3 interoperable XML API.

Amazon S3 storage and Azure Blob storage typically share the parameters in the following tables across multiple configurations. Sharing occurs through YAML anchor syntax in the Helm values files. This syntax provides a convenient way to share a common configuration throughout your values files. You can override individual references to these values with custom values.

Amazon S3 and Amazon S3 Compatible Storage Providers

Note You can encrypt objects in Amazon S3 storage using either SSE-S3 or SSE-KMS with a bucket key. For more information, refer to Protecting Amazon S3 Data with Encryption.

Set the following configuration in your AWS/aws-supplemental-values.yaml Helm configuration file or OnPrem/storage-values.yaml Helm configuration file. For more information on deploying configurations to your environment, refer to Updating SystemLink Enterprise.

You can configure secret references in the AWS/aws-secrets.yaml file, the OnPrem/storage-secrete.yaml file, or directly on the cluster. For more information on managing the secrets that the configuration requires for file storage, refer to Required Secrets.

Table 9. Amazon S3 and Amazon S3 Compatible Storage Parameters
Parameters Before the 2025-07 Release	Parameters After the 2025-07 Release	Details
Not applicable	dataframeservice.storage.type fileingestion.storage.type fileingestioncdc.highAvailability.storage.type feedservice.storage.type nbexecservice.storage.type	This value represents the service storage type. Set the value to s3.
dataframeservice.s3.port fileingestion.s3.port feedservice.s3.port nbexecservice.s3.port	dataframeservice.storage.s3.port fileingestion.storage.s3.port feedservice.storage.s3.port nbexecservice.storage.s3.port	This value represents the port number of the storage provider service.
dataframeservice.s3.host fileingestion.s3.host feedservice.s3.host nbexecservice.s3.host	dataframeservice.storage.s3.host fileingestion.storage.s3.host fileingestioncdc.highAvailability.storage.s3.host feedservice.storage.s3.host nbexecservice.storage.s3.host	This value represents the hostname of the storage provider service.
dataframeservice.s3.schemeName fileingestion.s3.scheme feedservice.s3.scheme nbexecservice.s3.scheme	dataframeservice.storage.s3.schemeName fileingestion.storage.s3.scheme feedservice.storage.s3.scheme nbexecservice.storage.s3.scheme	This value represents the scheme of the storage provider service. This value is typically https.
dataframeservice.s3.region fileingestion.s3.region feedservice.s3.region nbexecservice.s3.region	dataframeservice.storage.s3.region fileingestion.storage.s3.region fileingestioncdc.highAvailability.storage.s3.region feedservice.storage.s3.region nbexecservice.storage.s3.region	This value represents the AWS region the S3 bucket is located.
dataframeservice.sldremio.distStorage	Unchanged	Resolve the <ATTENTION> flags. These settings configure the distributed storage that is required for the Data Frame Service.
dataframeservice.storage.s3.auth.secretName fileingestion.storage.s3.secretName feedservice.storage.s3.secretName nbexecservice.storage.s3.secretName	dataframeservice.storage.s3.auth.secretName fileingestion.storage.s3.secretName fileingestioncdc.highAvailability.storage.s3.secretName feedservice.storage.s3.secretName nbexecservice.storage.s3.secretName	Secret name for credentials used to connect to the storage provider service.

Begining with the 2025-11 release, fileingestioncdc adds the following parameters.


Parameter	Details
fileingestioncdc.highAvailability.storage.s3.port	This value represents the port number of the storage provider service.
fileingestioncdc.highAvailability.storage.s3.scheme	This value represents the scheme of the storage provider service. This value is typically https.

Connecting Services to S3 through IAM

Assign an IAM role to connect services to Amazon S3.

Your system must meet the following prerequisites to connect each service through IAM.

Create an account for each service by setting the following Helm value: serviceAccount: create: true.
Note Flink services do not require this Helm value. The Flink Operator manages the service account.
Create an IAM policy with the following statement:
```
"Action": [
  "s3:PutObject",
  "s3:ListBucket",
  "s3:GetObject",
  "s3:DeleteObject",
  "s3:AbortMultipartUpload"
],
"Effect": "Allow",
"Resource": [
  "<s3_bucket_ARN>/*",
  "<s3_bucket_ARN>"
]
```
Note The <s3_bucket_ARN> placeholder represents the Amazon Resource Name for the S3 bucket of the service.
Create an IAM role that applies the new IAM policy.
Note Most IAM roles use the following naming convention: <release-name>-<service-name>-role. For example, systemlink-feedservice-role. Flink services do not follow this rule. Instead, IAM roles for Flink services share the same configuration as the Flink Operator. These roles use the following naming convention: <release-name>-flink-role.

After meeting these prerequisites, update the Helm values file to include the following configurations.


Service	Configuration
DataFrame Service	This service does not currently support IAM.
Feed Service	feedservice: storage: s3: authType: "AWS_WEB_IDENTITY_TOKEN" feedservice: serviceAccount: annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-feedservice-role"
File Ingestion Service	fileingestion: storage: s3: authType: "AWS_WEB_IDENTITY_TOKEN" fileingestion: serviceAccount: annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-fileingestion-role"
File Ingestion CDC	fileingestioncdc: highAvailability: storage: s3: authType: "AWS_WEB_IDENTITY_TOKEN" flinkoperator: flink-kubernetes-operator: jobServiceAccount: annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-flink-role"
Notebook Execution Service	nbexecservice: storage: s3: authType: "AWS_WEB_IDENTITY_TOKEN" nbexecservice: serviceAccount: annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<account-id>:role/<release-name>-executions-role"

Azure Blob Storage Providers

Note For the Data Frame service storage account, you must disable blob soft delete and hierarchical namespace.

Set the following configuration in the Azure/azure-supplemental-values.yaml Helm configuration file for Azure Blob Storage.

You can configure secret references in the Azure/azure-secrets.yaml file or directly on the cluster. For more information on deploying these configurations to your environment, refer to Updating SystemLink Enterprise.

Note The storage account for the Data Frame service must have blob soft delete and hierarchical namespace disabled.

Table 10. Azure Blob Storage Parameters
Parameters Starting with the 2025-07 Release	Details
dataframeservice.storage.type fileingestion.storage.type fileingestioncdc.highAvailability.storage.type feedservice.storage.type nbexecservice.storage.type	This value represents the storage type of the service. Set the value to azure.
dataframeservice.storage.azure.blobApiHost fileingestion.storage.azure.blobApiHost fileingestioncdc.highAvailability.storage.azure.blobApiHost feedservice.storage.azure.blobApiHost nbexecservice.storage.azure.blobApiHost	This value represents the host of the Azure Blob storage without the account name. For example, you can set the value to blob.core.windows.net or blob.core.usgovcloudapi.net. If your storage does not use the default port, add the port to the end of the host. For example, blob.core.windows.net:1234.
dataframeservice.storage.azure.dataLakeApiHost	This value represents the host and the port of the Azure Data Lake Storage to connect to without the account name. For example, you can set the value to dfs.core.windows.net. If your storage does not use the default port, add the port to the end of the host. For example: dfs.core.windows.net:1234.
dataframeservice.storage.azure.accountName fileingestion.storage.azure.accountName fileingestioncdc.highAvailability.storage.azure.accountName feedservice.storage.azure.accountName nbexecservice.storage.azure.accountName	This value represents the storage account for your service. NI recommends using different storage accounts for different services.

Limits and Cost Considerations for File Storage

To adjust limits and cost considerations for file storage services, refer to the following configurations.

Table 11. File Storage Considerations
Consideration	Configuration
Reduce storage costs	To clean up incomplete multipart uploads, configure your service. If you are using Amazon S3, configure the AbortIncompleteMultipartUpload value on your S3 buckets. Note Azure storage automatically deletes uncommitted blocks after seven days. For other S3 compatible providers, refer to the provider documentation.
Adjust the number of files a single user can upload per second	Configure the fileingestion.rateLimits.upload value. By default, the value is 3 files per second per user. By load balancing across replicas, the effective rate is higher than the specified rate.
Adjust the maximum file size that users can upload	Configure the fileingestion.uploadLimitGB value. By default, the value is 2 GB.
Adjust the number of concurrent requests that a single replica can serve for ingesting data	Configure the dataframeservice.rateLimits.ingestion.requestLimit value.

Related Information

Amazon S3 API Reference
Protecting Amazon S3 Data with Encryption
Updating SystemLink Enterprise
Modify the configuration or upgrade to a newer version of the SystemLink Enterprise application.
Required Secrets
Secrets are Kubernetes objects that are used to store sensitive information. The secrets listed in this topic are required and have the Opaque type unless otherwise specified.
SystemLink values Helm template
SystemLink Azure supplemental values Helm template
SystemLink AWS supplemental values Helm template
SystemLink Secrets Helm Template
SystemLink Azure Secrets Helm Template
Configuring a Bucket Lifecycle Configuration to Delete Incomplete Multipart Uploads in GCS
GCS Amazon S3 Interoperability API Reference
IAM permissions for XML requests
Soft Delete for Blobs
Azure Data Lake Storage Hierarchical Namespace