The following discussion is part of an occasional series showcasing the ISA Mentor Program, authored by Greg McMillan, industry consultant, author of numerous process control books, 2010 ISA Life Achievement Award recipient, and retired Senior Fellow from Solutia, Inc. (now Eastman Chemical). Greg will be posting questions and responses from the ISA Mentor Program, with contributions from program participants.
The following question by Mark Darby, key ISA Mentor Program resource, is designed to start a conversation on what can be done to improve PID control loop performance monitoring.
Mark Darby is a consultant to the petrochemical, refining, and oil and gas industries where he provides process control services in the following areas: benefit studies, project implementation, and technology evaluation/development. Previously, he held several technical and management positions at Setpoint and AspenTech. Mark has also been a lecturer in the Chemical Engineering Department at the University of Houston. He collaborates with the Chemical Engineering Departments at the University of Houston and Brigham Young University (BYU) and serves on the external advisory board of the Chemical Engineering Department at Texas Tech University.
Mark Darby’s Question
What are the work processes, techniques and solutions, and the people who should be involved for comprehensive control loop monitoring of all components and aspects including: measurement, final control element, filtering, update rate, tuning, disturbances, and interactions? It may be appropriate to distinguish between short-term and long-term issues.
Michel Ruel’s Answer
I suggest different levels of monitoring with specialized software or using simple measurements in archiving software or in data acquisition systems. The report should be sent automatically to the appropriate person with a specific format for each category. A process engineer needs a score for an area and the top five loops with problems. A process control engineer should receive a list of problematic loops or equipment. Finally, a manager should receive a score or a color-coded bar graph for each area.
The key is to select simple statistics on different metrics. For example:
- Percentage of loops corresponding to a certain criterion 95% or more of the time,
- Average percentage of time a group of loops is within selected boundaries,
- Percentage of abnormal loops, etc.
- Loop in highest mode
- (Auto/Manual, Cascade/Auto/Manual, Remote/Local/Manual, etc.)
- Loop not saturated (PV and CO)
- Min<Actual PV<Max AND Min<CO<Max
- Min, Max could be transmitter/valve limits or range defined by normal operation
- PV signal is normal
- Noise below a certain value
- PV movement detected within last x minutes (not frozen)
- Loop not in Alarm
- Min<Actual PV<Max AND Min<CO<Max
Loop Behavior Metrics:
- CO signal is normal
- No stick/slip cycling detected
- No oscillation detected
- Absence of stiction (special functions exist using power spectral analysis and advanced statistics)
- Process gain is within limits
- Loop interactions detected
- Abnormal operator activity
- Calculation of SP changes
- Calculation of Mode changes
- Abnormal statistics on PV and CO
- Valve travel per day
- Valve reversals per day
- Percentage of time valve is within normal range
- Statistics on SP, PV, and CO
- Set-Point activity
- Secondary cascade loops
- Standalone or master loop in a cascade system
- Operator activity (keystrokes per shift)
- Mode changes per day
- Statistics on key variables
- Alarm management metrics
An example of a report for supervisors and process engineers for an area:
- Percentage of loops operated normally
- Combination of indices 95% of time or more
- Loop not in Alarm AND
- Loop not oscillating AND
- Loop in highest mode AND
- PV not saturated AND
- CO not saturated AND
- Valve problems (combination of indices) not detected AND
- SP activity below above limits AND
- Mode changes above limit
- Combination of indices 95% of time or more
An example of a report for process control engineer for an area; for each element, percentage of time:
- Percentage of loops needing attention
- List of loops in oscillation
- List of loops with valve problems
- List of loops with transmitter problems
If a plant is using a software to supervise process control loops and control strategies, such reports are already configured, and the users can modify these reports. Reports can be sent to managers, supervisors, process engineers, planners, process control engineers, and instrumentation technicians. To avoid too many numbers, reports should present “digested numbers,” for example, percentage of loops operating normally where “operating normally” is define by equations and sets of metrics.
The frequency of the reports varies accordingly to each role. The reports should be limited to a very concise form, ideally less than one page but with hyperlinks to dig and analyze.
- Manager: Monthly and once a week unless statistics are below a certain number
- Process engineer: Weekly and daily
- Process control engineer/instrumentation technician: Daily
The main difficulty is to ensure peoples are taking actions. If the reports are simple enough, the results can be used to calculate annual bonuses, For example, for a process engineer, average annual performance (percentage of loops in normal operation).
Nicholas Sands’ Answer
The list of metrics provided by Michel Ruel above is excellent. The challenge is to focus on actions that make improvements with real benefits.
Tracking the benefits is much easier with a software tool. A good software tool will use process data, (PV, SP, OP) and event data (alarms, actions like mode, OP, or SP changes). Combining this data can point to loops that have benefits vs loops that do not. For example, if a reactor feed flow control loop is frequently in manual and the operator is frequently changing the output, there is likely benefit in addressing the issues with the loop. On the other hand, if the loop is a nitrogen purge flow loop used only during shutdowns, and the loop stays in manual with no operator changes to the output, there is likely little value in working on the loop. For this reason, it is helpful to have a way to categorize the value of the loops, and even more helpful if the software tool has this functionality.
If the right loops are identified with issues, then the data can be used in a few different ways for difference audiences. A list of the top opportunities is good for a team of operations, maintenance, and technical to review and develop a plan. A trend of key metrics is useful for management to understand if the overall performance is getting better or worse. Metrics could include percent of critical loops with good performance, time to address critical loop issues or how long a loop stays on the list, and the number of critical loops issues addressed, or business value captured.
Luis Navas Guzman’s Answer
My Control Talk article, “A structured approach to control system diagnostics,” addresses the levels of a control system for a situational analysis, taking into account ANSI/ISA-18.2-2016, Management of Alarm Systems for the Process Industries and ANSI/ISA-95 IEC 62264 Enterprise-Control System Integration. One should access the current situation, analyze your needs, implement new measures, and ensure sufficient diagnostics for each layer.
Greg McMillan’s Answer
The diagnostics provided by Michel, Nick and Luis are extensive and well worth the effort to implement. Care must be taken to recognize PID controller modes and outputs that are set by shutdowns, startups, transitions, batch operations, and procedural automation so that false diagnostics are not generated.
I am particularly interested in diagnostics that indicate how well the loop is doing and the challenges it is facing. I developed a DeltaV block that is used in Digital Twin simulations to compute the peak error and integrated error during normal operation to show the capability of the PID in rejecting load disturbances. The block can be reset based on operating modes and conditions. The block also captures the overshoot and undershoot and computes the rise time and settling time for setpoint changes.
Additionally, the block counts the number of communications which can be important in terms of battery life for wireless transmitters. A nice extension would be to capture the peak movement of the final control element that corresponds to a peak error to give an indication of the size of the disturbance and to compute dither as reversals in PID output that are greater than the deadband and resolution of the final control element, which is important for wear and tear and is indicative of excessive noise.
Finally, I would love to see the ability to recognize a limit cycle and compute its period and amplitude deciphering whether it is caused by dead band or resolution. The ability to do online metrics of process capacity and efficiency line, in my answer to the February ISA Mentor Program Blog, “What Can be Done to Increase Innovation in PID Control?”, greatly increases motivation for diagnostics by showing effect on the bottom line. Temperature, pH, and composition control loops generally have the greatest effect on the bottom line and should therefore be monitored more closely.
The easiest parameter to estimate is the total loop dead time, seen as the time it takes for the process variable to start to respond in the correct direction for a change in a controller’s setpoint or manual output. Periods of attenuating oscillations that are not present when the PID is in manual can then be analyzed as to their cause.
In Process/Industrial Instruments and Controls Handbook Sixth Edition 2019, Chapter 7 – Answers to Questions on Control System Fundamentals offers the following:
“The total loop dead time can be used to provide the guidance based on the following simple rules of thumb in determining the critical frequency (ultimate period), and the source of oscillations observed. This guidance is for loops that are not dead time dominant where a PID controller with Standard or Series Form is used. Item 2 can be relaxed if tight control is not needed. However, an extremely large filter or measurement lag can cause oscillations.
- Resonance can occur for disturbance oscillation periods 2x => 10x dead time
- PID execution rate and filter time should be < 0.2x and 0.1x dead time, respectively
- Oscillation periods < 4x dead time are indicative PID rate time too large
- Oscillation periods < 5x dead time are indicative PID gain too large
- Oscillation periods 5x => 10x dead time are indicative of a reset time too fast
- Oscillation periods > 10x dead time are indicative valve and VSD problems (dead band, resolution limit, slow stroking or rate limiting, and poor actuator and positioner sensitivity). Limit cycles (constant amplitude oscillations) can be caused by resolution limits and dead band if there are more than 1 and 2 integrating components, respectively. The integrating components can be due to integral action in controllers and positioners and integrating action in the process (e.g., level).
- Oscillation periods approaching or exceeding 40x dead time are indicative of a PID gain too low for near-integrating, integrating, and runaway processes.”
For more on PID performance, see the ISA Mentor Program webinar series “PID Options and Solutions,” the Control Talk column, “The concealed PID revealed, part 4,” and the book Tuning and Control Loop Performance Fourth Edition 2014.
George Buckbee’s Answer
I will address a few points that have not yet been discussed.
First, the control loop monitoring system must be configured to understand the process context and loop status. Loop performance metrics don’t mean much if the unit is offline, or in some other abnormal state. Each individual control performance metric or diagnostic may depend on the state of the unit, or the state of the loop. For example, it usually doesn’t make sense to monitor deviation from setpoint while the loop is in manual. It is critically important to configure the monitoring system to have the right logic to deliver quality metrics and diagnostics. If this step is skipped, then you will have a lot of numbers, but they will be hard to interpret.
Second, there is a difference between metrics and diagnostics. Metrics (Key Performance Indicators, or KPIs) can be helpful to track performance over time, to report progress to management, and to select areas for focus. Adding the economic weighting helps this tremendously. Knowing that 20% of critical loops are in manual mode is great information. It helps management to prioritize, but the engineer or technician needs more information to act. Why are the loops in manual? It would be a huge mistake to simply act by returning the loops to Auto mode!
Diagnostics, on the other hand, are used to pinpoint and solve specific problems. Some metrics can be used to drive diagnostics. For example, if the maximum PV over one hour is equal to the minimum PV over one hour, then the sensor is likely broken. Nick’s comment about focusing on actions is one of the most important points in this dialog. Diagnostics should drive specific actions to make improvement. Use metrics to manage the program, and diagnostics to drive daily actions.
Lastly, I will circle back to Mark’s original question about work processes. To get value from control loop performance monitoring, we need to think about how to integrate this information into the daily, weekly, monthly, and shutdown planning work processes. Defining standard workflows along with roles and responsibilities clarifies expectations, and makes the process stick in the organization. In my experience, sites that skip workflow planning are usually dependent on the actions of a single champion, and the gains tend to fade over time. Sites that take the time to develop and document workflows tend to consistently deliver improvements over the long haul.
Mark Darby’s Follow-Up
A few follow-up thoughts:
All loops should be monitored and inspected on a periodic basis (with or without a dedicated monitoring system), but some loops are normally more important to the proper operation of a process unit and, therefore, are a higher priority for monitoring.
For example, temperature controllers and pressure controllers in a distillation train. From my MPC work, I spot check these to ensure they are behaving correctly before looking at the behavior of the MPC. Also, the temperature controller will often show the effect of disturbances that might originate upstream. Setting up dedicated trends of these key loops (SP, PV, CO) over a meaningful time period is helpful in getting a quick impression of performance and to determine if additional analysis or troubleshooting is required. These loops are also the ones whose performance (therefore, tuning) is most important for minimizing disturbances to composition control. These can easily be benchmarked using the criteria mentioned above. A useful way of gauging disturbance rejection is to assess loop recovery to a measured disturbance, like feed rate, using historical data.
A good way to start a loop improvement campaign, which can become a key part of work processes, is to target loops by the service factor based on loop status. The assumption behind this is that if a control loop is performing adequately (as judged by the action of operator) it will be kept in its normal mode a majority of the time. A given loop may not have one normal mode. It may be conditioned based (such as throughput magnitude) and therefore conditional logic will be required. A combination of a specified loop importance (based on safety or economics) and a low service factor can then be used to prioritize loops for further investigation, leading to possible retuning, measurement improvement, valve improvement, or loop reconfiguration.