In May 2011, the management consulting firm McKinsey & Company released a research report titled Big data: The next frontier for innovation, competition, and productivity. This was a seminal report, but it certainly did not cause the big data explosion. The concept of big data is as old as computing and has long been used to describe data volumes exceeding the cost or capability of existing systems.
After 2004, the term “big data” began taking on its more modern meaning after Google released two papers on computing models for dealing with large data sets. It just happened that the McKinsey report was well timed to the explosion of awareness and interest in big data. After staying flat for many years, the number of Google searches for “big data” tripled in the 12 months after the report was released, and after 36 months increased by 10 times.
In a more recent McKinsey report, the figure 1 chart drills into data sources by industry. Manufacturing is far and away the leader, with 1,812 petabytes of data produced: 1,072 from discrete manufacturing and 740 from process manufacturing. These numbers have grown exponentially over the past five years or so, as the costs of creating, collecting, and storing data have decreased exponentially.
Note that the figure 1 data is presented in petabytes, a volume of data that was considered almost science fiction just a decade ago. Wired magazine wrote an article in 2006 surveying the explosion in data volumes and the innovative techniques available to gain insights from this data, declaring the arrival of the “Petabyte Age.” Less than a decade later a petabyte is, if not trivial, at least an unremarkable volume of data to store and process, with terabytes relegated to memory sticks handed out as trade show trinkets. But enough about the origins of big data, let’s look at how it is affecting manufacturing, specifically on the process side.
Process industries can reap substantial benefits from intelligent big data implementations. As figure 2 illustrates, McKinsey sees a $50 billion opportunity in upstream oil and gas alone, and other process industries can expect similar outcomes.
In addition to having the largest volume of data compared to other sectors of the economy, manufacturing organizations also have the longest history of generating and storing large volumes of data. The digitization efforts of plants implementing programmable logic controllers, distributed control systems, and supervisory control and data acquisition systems in the 1970s and '80s gave the industry a head start over later data generators.
This is why some vendors refer to manufacturing sensor data as "the original big data," or claim they have been supporting big data for years. These claims obscure some important facts about big data. It is not just about data volume, although it is certain that data volume in manufacturing will continue to grow. Pervasive sensor networks, low-cost wireless connectivity, and an insatiable demand for improved performance metrics will all continue to drive increased data volumes as the big data's partner in hype, the Industrial Internet of Things (IIoT), continues its momentum.
Despite the long history and large volumes of data associated with process manufacturing, the reality is that manufacturing organizations are considered laggards in exploiting big data technologies. Big data solutions in other industries are easy enough to find: credit card companies with fraud-detection algorithms, phone companies with customer churn analytics, and websites with product recommendation engines.
In our lives as consumers, we interact with the implications of these big data solutions every day in experiences as simple as using Google. Yet in manufacturing plants, the experience with big data is a mixed result of slow adoption, limited accessibility, and confusion.
Here are some of the main reasons that process plants have often been slow to exploit the potential of big data:
To overcome the confusion and blocking issues associated with big data in manufacturing, a new approach is needed, as outlined below. First, it is helpful to frame the ways that "big data" is used as both an umbrella term for the big data phenomena, and in three distinct contexts.
With these three points as a framework, we turn next to the question of what is really different today from the past. If process manufacturing firms have been storing vast amounts of data, what is new now or will be different in the future?
The first innovation is how growth in data volumes and types will be directly correlated to the lower prices associated with data generation and collection, so new solutions powered by big data will cost less in aggregate than previous generations of solutions. This is not always apparent as the market transitions to this new model, but what is expensive now in either cost (such as data expertise) or in impact (such as data movement and architectures) will become less expensive. A partial list of factors driving price down is:
The second significant big data innovation is the range and depth of algorithms and approaches available to organizations to find meaning in their data sets. Just as big data is a neighbor to the IIoT phenomena, it is also tied to the advances in machine learning and artificial intelligence. Therefore advances in cognitive computing-a composite term including machine learning, artificial intelligence, and deep learning-will become available to process manufacturers to accelerate and focus their analytics efforts. And the use of algorithms and tools available today-including regression, PCA, and multivariate analysis-will be made easier and more accessible to engineers, accelerating their efforts via software that converts big data analytics into easy-to-use features and experiences.
The third big data innovation is a more flexible model for analytics across data sources. This could be required because the data is consolidated and indexed as a single unit or because disparate data silos need to be connected and accessed more easily. In either case, the desired outcome is the same: unfettered integration and access to disparate data sources and types. "Contextualization" is the typical term for this capability in manufacturing environments; other terms for this flexibility in data types and architectures include data fusion, data harmonization, and data blending.
Big data solutions must include a flexible data model to accelerate and enable analytics across any set of data sources. There are as many models for accomplishing this as there are organizations, but the most basic types are a data lake and a distributed model.
A data lake is the modern instance of a data warehouse, except the data is of many types and typically indexed or architected for use by data scientists or developers. Data lakes are usually the domain of centralized IT departments that can afford the infrastructure and expertise-and manage the governance, security, and data models of these corporate-wide solutions.
Not every company needs or can afford this top-down approach, however, so an alternative is simply to enable connections across data silos in situ. This enables engineers to tap any resource on demand. The second approach is more bottom-up and user driven, because there is no data lake required.
As ever more data is generated, there are often fewer experts and resources available to inform and interpret the data. The retirement of seasoned engineers and the squeezing of budgets mean the big data equation in many industries is "more data with increased demands for analysis and information with fewer resources." Can the gap between engineers and data be closed, such that executives can start to see real results using the limited personnel resources on hand?
The software innovations required to deliver on this promise are similar to those already in place in numerous commercial software apps and web-based tools:
Asset optimization, overall equipment effectiveness, and uptime are not new concepts. There have been generations of preventative maintenance, enterprise asset management, asset performance management, computerized maintenance management, and other systems offered to ensure higher availability of critical resources in production facilities. What is changing with big data is that asset expertise is now available as a service from automation and equipment vendors.
There are examples of this already, but now the cost of data collection, storage, and analytics will make these offerings more accessible. In addition, the new services will have more advanced algorithms and be run across more data to improve the accuracy of the system.
For example, who knows the most about the performance of your turbine: the manufacturer, the local sales rep, or you? The easy answer is the manufacturer. There is a theorem in computer science that what best powers accuracy is greater amounts of data. Instead of relying on an on-site engineer with limited time and capacity to become an expert in an asset class and history, organizations can tap the specific expertise of vendors to manage their most critical assets worldwide.
And for organizations that do have the capacity and resources to develop in-house expertise, their engineers can take advantage of both vendor expertise and local process context to address asset optimization within their manufacturing facilities. Remote monitoring, predictive analytics, and field management systems will therefore become an increasing part of the budget and operational plans for asset-centric organizations, which of course describes many process industry firms.
Big data represents the present and future of data management for all industries. Given the quantity of data and the long history of data centricity, big data has particular relevance to the process industries. Being able to see through the hype and understand how data and analytics can improve outcomes is a critical step for engineers and plant managers alike to realize actionable insights and improve production.
A version of this article also was published at InTech magazine.