My first post regarding good automation system design concepts approached the issue of robust design from the field level, looking at field devices and wiring practices. This time we move up the automation food chain to look at the control platform, which is made up of controllers, input/output (I/O) modules, and the networking means that interconnect these devices.
Traditional or classic I/O signals (either the on/off discrete or variable analog variety) rely on field wiring which can fail or be damaged. Today’s I/O modules, especially the analog versions, usually offer some level of diagnostics. More advanced versions incorporate circuit monitoring methods for identifying open and short circuits. However, while these signals may be viewable from configuration or diagnostic software, they may not be fully implemented from the factory. If diagnostics are available then good practice dictates they should be indicated to the operator, even if it is only as a common “trouble” alarm.
How to handle trouble signals
Some trouble signals are transient in nature. The best way to handle any possibly intermittent fault is to latch the condition, and require an operator to acknowledge/reset the situation when they have identified and cleared it. It is important to trap these situations because otherwise they will be unseen and will mask the root cause of a problem.
An even better level of design than just indicating trouble is to also configure the control system to respond gracefully to faults. For instance, if a critical temperature signal goes over range, don’t just disable the associated heater. Alarm the situation and drive the control sequence for the entire equipment area to the safest possible state.
Just as field devices must be selected to fail-safe upon power or signal loss, the control system output signals must be selected to go to a safe value upon control system failure. A control system failure could be a true detected failure, or it could be rebooting the controller. The most general fail-safe signal level is “off” condition. However, there are cases where “hold last state” or even “on” might better protect field equipment.
Within the controller itself, there are likely a number of diagnostic features available. Major product lines include significant capabilities which may need to be enabled and exposed to the operator in some manner. Minor faults are informative in nature; examples are low backup battery conditions, module failures, and bad communications counters. Major faults may be recoverable or unrecoverable, but will cause the processor to stop unless additional measures are taken. It is possible for users to develop controller and program fault routines that can intercept and capture the fault code, increment a fault counter, and attempt to reset and recover from the fault.
Operators need to be in the know
Keep in mind the audience when exposing faults via a human-machine interface (HMI) or operator interface terminal (OIT) system. Operators need to know enough information to solve process problems and keep the plant running. For control platform faults, alarms to the operator are not intended for them to try and solve the problem, but rather to prompt them to engage maintenance personnel or engineers.
The increasing intelligence of equipment and variety of solid industrial networking protocols and media deserves special attention. Even the smallest instruments and devices can be capable of communicating via industrial bus systems, and it is much more likely that larger packaged equipment is provided with communication-ready controls.
Whenever smart devices are incorporated into a control system, the data quality must be validated using available diagnostic signals. These are analogous to the classic I/O diagnostic signals, but indicate if a device is properly online. Bad data quality must be trapped and handled in the programming just as with classic I/O signals.
Deploying “heartbeat” logic
Two programmable controllers communicating with each other are typically exchanging a larger quantity of data, often bi-directionally. Sometimes it is not clear whether a communication failure will result in data that is all zeros, or data that holds the last state, so it may not be easy to recognize the failure. Of course, there may be diagnostic signals available to indicate the health of the connection. The best practice when both ends of the link are programmable is to implement a form of “heartbeat” logic on each end. For instance, each controller can look for a known changing value in the other controller (like the time), or the two controllers can pass an incrementing value back and forth. In this way, each controller can individually know if the other is responding, and can locally indicate and act upon a communication failure.
Today’s powerful automation platforms offer more diagnostic and situational awareness capabilities than ever before; they just need to be put into service. My next post will move away from looking at hardware related issues, and will focus on some software programming best practices that help make a robust design.