In test and measurement applications, engineers and scientists can collect vast amounts of data every second of every day. For every second that the Large Hadron Collider at CERN runs an experiment, the instrument can generate 40 terabytes of data. For every 30 minutes that a Boeing jet engine runs, the system creates 10 terabytes of operations information. For a single journey across the Atlantic Ocean, a four-engine jumbo jet can create 640 terabytes of data. Multiply that by the more than 25,000 flights flown each day, and you get an understanding of the enormous amount of data that exists (Rogers, 2011). That’s “Big Data.”
Drawing accurate and meaningful conclusions from such a large amount of data is a growing problem, and the term Big Data describes this phenomenon. Big Data brings new challenges to data analysis, search, data integration, reporting, and system maintenance that must be met to keep pace with the exponential growth of data. The technology research firm IDC recently performed a study on digital data, which includes measurement files, video, music files, and so on. This study estimates that the amount of data available is doubling every two years. In 2011 alone, 1.8 zettabytes (1E21 bytes) of data were created (Gantz, 2011). To get a sense of the size of that number, consider this: if all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate one zettabyte of data (Hadhazy, 2010). Almost double that amount was generated in 2011.
Big Data is collected at a rate that approximately parallels Moore’s law.
The fact that data is doubling every two years mimics one of electronics’ most famous laws: Moore’s law. In 1965 Gordon Moore stated that the number of transistors on an integrated circuit doubled approximately every two years and he expected the trend to continue “for at least 10 years.” Forty-five years later, Moore’s law still influences many aspects of IT and electronics. As a consequence of Moore’s law, technology is more affordable and the latest innovations help engineers and scientists capture, analyze, and store data at rates faster than ever before. Consider that in 1995, 20 petabytes of total hard drive space was manufactured. Today, Google processes more than 24 petabytes of information every single day. Similarly, the cost of storage space for all of this data has decreased exponentially from $228/GB in 1998 to $.06/GB in 2010. Changes like this combined with the advances in technology resulting from Moore’s law, undoubtedly fuel the Big Data phenomenon and raise the question, “How do we extract meaning from that much data?”
1. What is the value of Big Data?
One intuitive value of more and more data is simply that statistical significance increases. Small data sets often limit the accuracy of conclusions and predictions. Consider a gold mine where only 20 percent of the gold is visible. The remaining 80 percent is in the dirt where you can’t see it. Mining is required to realize the full value of the contents of the mine. This leads to the term “digital dirt” in which digitized data can have concealed value. Hence, Big Data analytics and data mining are required to achieve new insights that have never before been seen.
A generalized, three-tier solution to the “Big Analog Data” challenge includes sensors or actuators, distributed acquisition and analysis nodes, and IT infrastructures or big data analytics/mining.
2. What does Big Data mean to engineers and scientists?
The sources of Big Data are many. However, the most interesting is data derived from the physical world. That’s analog data captured and digitized by NI products. Thus, you can call it “Big Analog Data”— derived from measurements of vibration, RF signals, temperature, pressure, sound, image, light, magnetism, voltage, and so on. Engineers and scientists publish this kind of data voluminously, in a variety of forms, and many times at high velocities.
NI helps customers acquire data at rates as high as many terabytes per day. Big Analog Data is an ideal challenge for NI data acquisition products such as NI CompactDAQ, CompactRIO, and PXI hardware, and tools like NI LabVIEW system design software and NI DIAdem to organize, manage, analyze, and visualize data. A key advantage of these products is the ability to process data at the source of capture, often in real time.
You can change this processing dynamically as needed to meet changing analytical needs. Embedded programmable hardware such as FPGAs offer extremely high-performance reconfigurable processing literally at the hardware pins of the measurement device. This allows the results of data analytics from back-end IT systems to actually direct a change in the type of processing that happens in NI products at the source of the data capture.
Big Analog Data solutions strongly depend on IT equipment such as servers, storage, and networking for data movement, analytics, and archiving. You increasingly face challenges with creating end-to-end solutions that require a close relationship between DAQ and IT equipment.
As an industry leader, NI is best suited to help you step up to Big Data challenges by providing solutions that are IT friendly and publishing data that is “Big Data-ready” for analytics on either in-motion or at-rest data. One thing is certain, NI is continually expanding its capabilities in data management, systems management, and collaborations with IT providers to meet the Big Data challenge.
Dr. Tom Bradicich
Dr. Tom Bradicich is an R&D fellow at National Instruments.
Stephanie Orci is a product marketing engineer for DIAdem at National Instruments.
“Big Data is Scaling BI and Analytics.”
1 Sep 2011. Web. 30 Aug 2012.
Gantz, John, and David Reinsel.
“Extracting Value from Chaos.”
June 2011. Web. 8 Aug 2012.
“Zettabytes Now Needed to Describe Global Data Overload.”
4 May 2010. Web. 31 Aug 2012.
This article first appeared in the Q4 2012 issue of Instrumentation Newsletter.