In a previous post regarding good automation system design concepts considered the issue of robust design at the controller and communication levels. This time we shift the focus to software best programming practices, both at the controller and the operator interface levels, which can help result in a solid automation system.
Probably the No. 1 programming recommendation is to validate all information before using it. Within the control system platform, validate any process signal data, classic or networked, to confirm it has good quality. This especially applies to ensuring that analog input values are scaled properly within the allowable range. Incorporate debounce logic on discrete inputs so the brief time delay will ensure that a positive field signal has been received.
Validate operator inputs
Many kinds of data are entered by operators via an HMI/OIT or other device, in order to command the control system operation. Always validate operator inputs. This primarily applies to checking that operator-entered setpoint values are within a legal range. However, it can even apply to discrete push-button presses. For instance, incorporating a brief “push and hold” delay on a start button can guard against accidental pushes. Furthermore, multi-mode push-buttons should be made mutually exclusive and configured so that the safest choice (if possible) dominates. For a start/stop pair of buttons, the “stop” function would prevail over the “start” function.
When it comes to validating operator-entered data, many HMI/OIT packages offer this capability. An even more rigorous method is to limit-test the data within the control logic before accepting it. Individual project needs will dictate whether improper data is clamped at the limit, or rejected in a way that warns the operator. For logic that calculates derived values during runtime, if these values are used for subsequent operations they need to be validated just as if they were operator-entered. The most common validation scenarios are ensuring a value is not over-range or under-range, or making sure a value is not negative.
Consequences of bad data
What are the consequences of bad data? Improperly ranged or negative values can cause control loops to wind up. An unexpected zero can trigger a processor-halting divide-by-zero error in a calculation. Or, an improperly set or incremented value used as an indirect array address can actually point to an illegal location outside the valid range, causing a processor to stop running.
At a higher level, functional systems should be programmed to enter a safe state (usually “off”) upon system boot up, or on any critical error. Startup routines can be created in order to initialize data and values, and to drive control logic to a safe state. Rarely should sequences automatically restart after an alarm condition is removed. Instead, consider a two-step procedure where operators clear the error, then trigger a restart.
During detailed design, consider adding some enhanced alarming that can help operators identify unusual trouble. For instance, a system with a pump filling a tank may have high level, low level, and pump fail alarms. If it is known that the tank should only take 5 minutes to fill, then a “slow fill” alarm can be incorporated. This alarm will not specifically define why a slow fill is happening, but could prompt the operator to go look for a broken hose or fitting that is spilling water to the floor.
Configure software to self-recover
In addition to disallowing any invalid operational modes, software should be configured to self-recover itself to a safe mode if it is ever inadvertently driven to an illegal mode. Sometimes it makes sense to give the operator a “reset” or “abort” control that can stop and re-initialize a problematic sequence. Keep in mind that some would consider this type of logic to be a Band-Aid intended to make up for other poor programming practices.
Fault and alarm indications on HMI/OIT stations must be clear and understandable. Cryptic messages or codes cannot be reliably acted upon. System reactions to operator inputs must be responsive enough to prevent operators from making multiple selections which could trigger undesired operation. Just as with consumer devices like phones and DVRs, a lagging response will cause the frustrated user to keep pressing buttons fruitlessly.
Develop a test plan
How do you know if your good engineering efforts are sufficient to defend against the unexpected? Test, test, test! Develop a test plan, preferably around the time the system is designed, so that it tests all key features. Attempt to trigger or simulate various failures and potentially illegal operator actions. Execute the test plan, and don’t be afraid to use it as a springboard for developing additional specific test cases that look useful. Make some test actions faster and slower than typical to search out bad interactions.
We started this blog series comparing an automation engineer’s tasks with those of a driver on a challenging road. In both cases, it is clear that training, planning, practice, and experience will lead to the most successful outcome. Not every bad situation can be prevented, but a multi-layered approach for defending and reacting to the unknown is the best bet. There are usually few arguments against building a resilient automation system, which can safely and defensively respond to non-normal conditions. Always be on the lookout for opportunities to improve your designs by challenging them with various conditions that flush out potential weaknesses.