Tale from the production floor - the processor that starts and stops unexpectedly - EDN

2022-05-28 15:32:26 By : Mr. Tony Chen

Many years ago I worked in the R&D department of a company which developed and manufactured specialized consumer products. One of our products had a separate power supply which included an LCD display and programmable controls. Those were the days of 8-bit microcontrollers and monochrome LCD displays.

I was tasked with designing a test jig for the main board of the power supply. The test setup I designed was based around a National Instruments data acquisition system, with some specialized control software running on a PC, and an in-circuit programmer which programmed the board with the final software version at the end of the test cycle. Connection to the board was with a manually operated ICT (in circuit test) jig with pogo pins contacting test points on the bottom of the board.

Operation of the test setup was very simple. The operator would place the board on the jig, lower the top plate, scan the serial number and press “test.” For a good board the result “test passed” would appear on the screen, otherwise a message with the nature of the board fault would be shown.

Do you have a memorable experience solving an engineering problem at work or in your spare time? Tell us your Tale.

The jig was built by a contractor, I built the interface board, wrote the test software, and did the debugging. Everything went well. The boards were tested and programmed, and the tester diagnosed board faults. So, off it went to the production floor.

Trouble started immediately. The tester was flagging the boards as good, but when assembled in the power supplies, the LCD screens were displaying random characters and general gibberish. Some of the power supplies wouldn’t operate properly. Of course, it all came straight back to my desk with a demand for an explanation.

Finding the cause wasn’t difficult, actually, and involved examination of the memory contents.

When testing the jig, I had been working at my own pace.  But on the production floor, the technicians worked much faster. As soon as the screen flashed the “pass” message, they would lift the board off the jig within a few seconds. This released the “reset” pin of the microprocessor, and the capacitors on the board still held enough charge to power up the circuit, so the processor would immediately start to run its program for the very first time.

The program would go to the flash memory to retrieve the working parameters, and, finding the memory empty, would start to write default parameters.  At this point, the capacitors would run out of charge, and the processor would power off in the middle of a flash memory write cycle, leaving the memory with undefined contents. When powered up after assembly, the processor would find parameters in the memory, not realize that they were corrupted, and put them up on the screen.

The correct fix would have been to re-write the operational software to add more memory checks, but time was short and the software people overworked and in a bad mood, so I simply added a 2-second delay between powering off the test jig and displaying the “test passed” message. After that, it was all plain sailing.

Benny Attar currently works as engineering team leader in an EMC test and certification laboratory.

A similar setup (ICT) to the story above, but with a test of the oscillator circuit. About 40% of the boards would fail to start the crystal oscillator. A retry, sometimes 2, would wake it up. I was a new arrival to the company, and got some small circuit boards made to accommodate a divide-by-100 counter and mounted close to the pogo pins that connected to the crystal. This dramatically reduced the parasitic capacitance on the crystal, and cut the failure/retry rate to less than 1%. I had been criticised for these failures, explainted the reason to my manager, then about 6 months later, EDN published an article on this very problem. On another problem, component failures around one corner of a double-sided board. For about a month, I was getting the blame for a horrendous failure rate, “the ICT saying that they were failing, couldn’t be an assembly problem”. Having insisted that it was, I did a very leisurely assessment of the actual assembly process, watching each step very carefully until I came to the point where the board was flipped over and pressed into the conveyor channel … with the troublesome corned snagging on the side ! Every single board was being stressed, the infant mortality rate must have been horrendous ! Not my fault, not even my responsibility or expertise to debug the assembly process, but simple observation eliminated the problem instantly.

NEVER use time delays to patch up software (or hardware, for that matter). If people would STOP doing that, things would have a prayer of working. If something has to happen (such as a voltage coming in spec, or data set be written), then that voltage or the write success should be the thing that inhibits the function, not some random time passing. The time delay you impose is too long unless it is too short! May the project be plagued with bugs until it is fixed properly!

Benny’s phrase “When testing the jig, I had been working at my own pace. But on the production floor, the technicians worked much faster.” reminds of an opposite problem from early in my career.

I was trying to diagnose a problem with a lock-in amplifier that we were using to extract a useful signal from a very noisy biological specimen that we were stimulating with a known signal. This was around 1985 and the instrument had nearly a dozen ten-turn pots for the operator to adjust. I would take the instrument to my lab, a wildly “twiddle” every pot through is range trying to go through at least some of the millions of possible pot setting combinations. Every time I decided the instrument was working perfectly, it would fail again when used for an actual experiment.

Finally, I spent a good part of a day watching an actual experiment. The researcher painstakingly adjusted each ten-turn pot very slowly, trying to get the optimal signal. That’s when it dawned on me, one of the pots was failing intermittently – but only when turned very *slowly*. I replaced that pot and all was well.

You must Sign in or Register to post a comment.