Normalizing Data for Efficient Storage and Access
- Updated2025-11-26
- 3 minute(s) read
Normalizing Data for Efficient Storage and Access
Data tables are a read-optimized, columnar data storage format designed to store tables with millions of rows of data. You can use the Data Frame Service API to normalize data from multiple file formats into one common format. Using one format for all data allows you to create reusable analysis routines and visualizations.
You can use data tables to store waveform data or time-series data. Data tables that contain time-series data do not require a constant time interval.
You can perform the following actions with data tables.
- Use multiple extract, transform, and load (ETL) pipelines to convert different file types with different data structures into a data table.
- Use common analysis routines and visualization techniques to interact with the single, normalized format.
- Append multiple new rows to a data table in a single API call. A data table can have
an unlimited number of rows. The rows can be in any order since any column can
reorder the rows when the table is read. Note Row data might not be available for read for up to five minutes after it is written.
- Use the Data Frame Service API to read data within a data table and specify the columns and number of rows.
- Query table metadata to return one or more data tables that match the parameters of the query. This is useful to identify tables that are associated with a test result or other test metadata.
- Query within the table for specific data. This is useful when you are searching for a particular characteristic in your data that is not captured in the table metadata. For example, you can query within a data table to find the first instance of a value above a particular threshold.
- Export queried table data as a comma-separate values (CSV) file to view the normalized data in a spreadsheet editor.
When querying within a table, you can decimate the data before returning it to the caller. Use one of the following methods to decimate the data. Decimation is useful when visualizing large data sets. Decimation is also useful in analysis when the shape of the data is more critical than each individual point. For example, you might use a MAX_MIN decimation method to find outliers without returning all data in the data table.
| Method | Description | Example |
|---|---|---|
| LOSSY |
Returns a set maximum number of points from a uniform sample of the result set. If there are fewer data points than the maximum specified, the decimation returns all points. This method allows you to see the general shape of the data quicker but is less accurate. Spikes are not guaranteed to show up while plotting the result set. |
|
| MAX_MIN |
Returns the points where the selected Y-channel reaches its maximum and minimum values in each interval of data. This decimation allows you to plot the data using continuous lines. Using continuous lines maintains the shape of the data, including spikes. |
|
| ENTRY_EXIT |
Returns a similar set of points as MAX_MIN except that it adds the entry and exit points for each of the intervals. Entry is the left-most point in a graph where the x-value is minimum in an interval. Exit, is the right-most point where the x-value is maximum in an interval. |
|
Null values and NaN values can disrupt the shape of the data in the following ways if there are too few data points per interval.
- NaN values can appear as the minimum or maximum value for the column containing theses values.
- Null values are treated like Infinity and appear as the maximum value.
Related Information
- Normalizing Incoming Data Automatically
Create a routine to automatically convert incoming data files into a data table.