Difficult Firmware Problem Rapidly Resolved
In this case, the client was a supplier of industrial and laboratory infrared temperature measurement equipment. This client was experiencing a problem with one of its products in a line of firmware based, handheld temperature measurement devices that allowed the user to store temperature readings by aiming at the item to be measured, then pressing a button on the device. Client had recently launched a new version in the model lineup using the core software of the other models in the line, with some modifications to address its unique performance characteristics. Previous models, which had been in the market for several years, appeared to function fine, but the new version would occasionally hang up during use.
Client assumed problem was in the storage routines and after weeks of troubleshooting by the staff programmer, this consultant was asked to review the embedded 8051 assembly code to locate any errors in the storage routines. This consultant compared the source code for different models in the line, including the new version, and single-stepped through the code on simulators without being able to duplicate the problem or find anything in the storage routines that might cause the problem.
Consultant traveled to the Client's facility to see the system hang up during use and understand the conditions that caused the problem. After viewing the problem and finding the conditions to repeat it consistently, this consultant used an in-circuit emulator and data analyzer connected to the same hardware that had exhibited the fault condition. When the fault occurred, the consultant witnessed unexpected, seemingly random, code jumps. By repeatedly causing the problem, seeing where the code jumped to, then carefully analyzing the internal status of the system right up through the time of failure, consultant was able to locate a stack overflow caused by insufficient stack size and incomplete interrupt handling. This problem did not occur in previous models because subroutines were able to execute fast enough to handle sequential interrupts and push back off the stack before the next interrupt. Due to the increased computation time required for the new model and additional registers required to be saved on the stack during an interrupt, a series of sequential interrupts from the user could cause the stack to overflow only under certain conditions, causing the unit to hang up.
The problem was resolved by: a) repositioning the stack in RAM to increase the stack size and b) limiting the number of sequential interrupts allowed. Once sequential interrupts reached the limit, the system would momentarily suspend the user interface to handle all subroutine activity before resuming normal operations. Resolution was found in less than two days.
To see the resume of the expert associated with this case study, see the link below.
| Resume of POL | Patent Agent & Intellectual Property Expert Consultant Resume |