If we start with a raw data set that includes data from multiple sources we typically run into 4 problems:
Figure 1: The Inconsistent Data Problem.
- Multiple Sources and Formats
- Different Metadata Names and Values
- Different Engineering Units
- Data Might Contain Errors
If these issues exist in the raw data, they ultimately result in inconsistent analysis for the following reasons.
- Consolidating File Formats Manually Leads to Errors
Multiple file formats and sources are used in a test setup for a variety of reasons, but is most often attributed to different groups within the company testing subsystems or components. These different groups could use different hardware vendors or different processes. Unless an automated process is used, consolidating data from multiple file formats is a tedious and manual process that leaves room for possible human error and wasted time.
Data querying tools, like DataFinder Server, can reduce the time and effort it takes to find the data you need to analyze. Creating a query requires metadata names and values to be used. If these metadata names and values don't match and extra queries aren't used to expand the search results, the data won't be found and analyzed.
- Analysis Results are Incorrect
Incorrect engineering units and erroneous raw data can lead to drastic changes in the results of analysis procedures resulting in invalid results. Different engineering units can be a serious issue for teams located in different sites where on standardized unit hasn't been agreed upon. The problem also tends to happen when testing is completed by other groups outside of the company. Sensor failures, data corruption, or human error can also have the same effect. These problems are typically only realized after a critical error occurs.