The following discussion is part of an occasional series, "Ask the Automation Pros," authored by Greg McMillan, industry consultant, author of numerous process control books, and 2010 ISA Life Achievement Award recipient. Program administrators will collect submitted questions and solicits responses from automation professionals. Past Q&A videos are available on the ISA YouTube channel. View the playlist here. You can read all posts from this series here.
Looking for additional career guidance, or to offer support to those new to automation? Sign up for the ISA Mentor Program.
See Part 1 here. See Part 2 here. See Part 4 here. See Part 5 here.
Here we continue our series with enlightening and entertaining answers by Hunter Vegas, the co-founder of the ISA Mentor Program. For more, see Hunter’s 50 tips in the ISA book, 101 Tips for a Successful Automation Career.
Greg McMillan’s Question
What are the biggest mistakes you have seen in automation system design, configuration, calibration, installation, checkout, commissioning, and maintenance? What were the consequences and the fixes and what can be done to prevent future occurrences?
Hunter Vegas’ Answer
Over the years, I have certainly seen (and made) my share of mistakes. Fortunately, most of the epic blunders were ones I witnessed rather than created, and when those occurred, I worked hard to understand what happened and determined how I could avoid a similar situation in the future.
Here are some of the more memorable snafus I have seen:
1. Assuming plug-and-play communications are in fact plug-and-play. Any kind of control system communication is an enormous red flag for me. I cannot begin to name the number of marketing ads I have read promising that one need only plugin a few wires and establish a system-to-system communications instantly. News flash: It almost never works that way!
In this case, a systems integrator was doing a control system replacement that had to be completed in about a week. Rather than rewiring the existing wires to new IO cards, the integrator decided to use a very new and untested interface that was supposed to plug into the existing IO card communication networks and bring all the data into the new system. They did no advanced testing to confirm this would work as promised.
During the outage, they pulled all the old computers and processing equipment and set the boxes outside as they installed the new equipment. Then they plugged in the wires, and nothing worked. No communications, no data flow, nothing. After a few days of frantic phone calls and panic, they had to start hauling in the old equipment and trying to get it working again. Fortunately, it had not rained too much and after another week, they were able to get the old system running again. Six months later, they tried and failed again! It would take three outages to get the new control system online.
Key learning points:
- Always pre-test and prove any critical communication in advance of a shutdown.
- Avoid being serial number one. Let somebody else iron out the bugs before utilizing bleeding edge technology.
- Have a Plan B.
2. Failing to have a Plan B. There are so many instances where I have seen this play out over my career. Most of my automation projects are retrofits where we are ripping out an old system and installing its replacement in a very short time. The logistics are often very tricky, and the new equipment must come up flawlessly. However, if for whatever reason something goes wrong, you must have a backup plan.
Perhaps the most spectacular display of this blunder occurred at a paper mill. Many paper machines employ a very specialized analyzer to continuously scan the sheet as it comes through the machine and adjust various points in the process to maintain consistent density, thickness, moisture, etc. This equipment is the heart of the machine and critical for maintaining quality.
A vendor sold a paper mill a new version of the analyzer and did a replacement project during the annual outage. The old system was ripped out (nothing was saved) and the new equipment installed. Then, they tried to run the machine and it simply did not work. The company spent weeks working 24/7 on the machine trying to get it to operate. In the meantime, the paper machine was either making off-spec product or not running at all. Ultimately, it took nearly a month past the original startup date to get the equipment running.
Key learning points:
- Do as much advanced testing and planning as you can to improve the odds for success.
- However, plan for the worst and build in alternatives if something unexpected happens. If the company had simply saved the original equipment, they could have reinstalled the old analyzer and had it operating in just a few days, buying themselves time to resolve their equipment problems.
- There are lots of ways to insert a Plan B into the design. Some projects have gone as far as installing a large number of switches to allow them to switch between the old and new control system in just a few minutes. Often, just some advanced planning and careful removal of the old system provides a means to recover it if absolutely necessary.
3. Accepting a skid control package as is. Perhaps there are great skid manufacturers who always use quality instrumentation and well-designed controls for their equipment. Unfortunately, I never seem to get those folks bidding on my projects.
Early in my career I was working on a plant that incorporated several skid packages (flame controls, blower package units, startup heaters, etc.). Being relatively new to the game, I naturally assumed that the vendors had been making this equipment for years and would furnish a well-designed, long lasting control package. However, I did specify certain manufacturer and models of acceptable instrumentation to minimize our spare parts.
The equipment was inspected by our mechanical engineer, so I did not see the equipment until it arrived. It was a nightmare! The electrical installation was laughably bad and had little resemblance to code. They had ignored our instrument list and outfitted the skids with an array of the cheapest instrumentation they could find. Above all else, very few if any of the packages worked and none of the drawings were accurate. It was a long and painful start up for me.
Key learning points:
- Be very specific on your list of acceptable instrumentation, down to the model number. Make sure the contract clearly states that this is a requirement to get paid.
- Perform a complete functional test at their site prior to shipping. Allow nothing to be shipped until it has passed these tests.
- Check the documentation as part of the functional test.
- Always check any necessary communications as part of the functional test as well.
- Insist on getting the equipment and controls you want. It may cost more, but it will cost less than replacing the entire controls package, which is what usually happens within a few years.
4. Assuming grounding is straight forward. How hard can grounding be? Hook a wire up to a grounded rod and call it a day. Right? Over my 36-year career, I do not think I can recall a single time when two grounding experts agreed. Every controls company seems to have their own idea on how grounding should be, and often they change their minds repeatedly over the years.
I was working in a very large, continuous plant in Louisiana that typically ran three to five years without a shutdown. If the plant tripped, it took three days and a lot of money to get it running again, so inadvertent plant trips were a very bad thing. The control system vendor had issued several different grounding strategies over the years, and as new sections of the plant were added, the grounding generally followed the “grounding instructions de jour.” Over time, grounding practices became a mishmash of various methods.
Then the plant started tripping during major lightning storms, which can be a daily occurrence in southern Louisiana! A short time later, there was a major electrical fault that sent 4160V to ground. The fault burnt the traces off several distributed control system (DCS) control boards yet managed not to pop a single 1/16A fuse.
My job, as plant engineer, was to fix this problem while the plant was running. Ultimately, I consulted with the vendor to determine the correct grounding method, and then checked the grounds in every instrument, junction box, and DCS IO card in the entire plant. It took weeks! I was down to one last instrument—a speed command transducer controlling a very critical air machine. All I had to do was simply clip the ground wire and tape it.
I discussed my plan with the production supervisor, who was not happy at the thought of me messing with his machine. I assured him it would be quick and easy, and he wouldn’t even know it happened. He again asked if I was sure this would be OK. I confidently responded, “Trust me.”
I went to the machine, opened the speed transducer cover, and clipped the wire. Then, suddenly chaos ensued. The compressor went from 85% to 0% speed in an instant. The production supervisor screamed at me and into his radio trying to help the operators keep the plant from tripping. During all of this, I was frantically tying the shield wires back together and when I did, the compressor speed was restored. The plant was rocked very hard, but the operators kept it from tripping. My nickname was “Trust Me” for years afterward.
Key learning points:
- Never say “Trust me!”
- Never underestimate the impact of grounding and never take it lightly.
- Anticipate the seemingly impossible. The series of conditions that led to this event were astounding:
- The vendor had at one point suggested that the shield of each analog output be tied to the negative wire at the instrument.
- Then they decided they did not want the shield tied to the negative but instead wanted the shield to be grounded at one point on the loop.
- In this case, the shield was grounded in one place and tied to the negative wire at the instrument.
- But then the negative wire broke. However, the loop was using the shield as a path back to ground, so the loop still worked.
- Until I cut the shield, opening the loop, and sending the compressor speed to zero!
As a footnote, I will say that the grounding effort was successful. A few months later the plant took a direct lightning strike to the flare and kept running!