How To Debug Complex Software

As a mobile app developer, it’s imperative to be able to troubleshoot and find the root cause of an issue in a timely and efficient manner.  I’ve found working with more junior developers that the steps for effective debugging are much simpler than most would think.  The key is in taking a logical and step-by-step strategy to find the problem.

Here are some tips and ways that have been proven to help pinpoint trouble:

Having Enough Data Or Reproducible Steps

This one is probably the toughest.  If there’s a sporadic or hard to reproduce issue, you may have to go back to the well multiple times to ask questions.  The important part is to know what kind of questions to ask.  Steps the user took, how often the issue occurs, what screen they were on, what operating system, what kind of device, and what version of the app are all good baseline details to have.  When working with forms, a sample of data the user input is a wonderful idea.  Since IoT is becoming more and more popular, you may need to inquire about the kind of network (3g, 4g, 2.4gHZ wifi, 5gHZ wifi, etc).  My best advice is to make sure you have enough data that’s actionable to investigate before continuing.

Last Point Of Success

So there you are investigating a reported issue.  You have the details provided by the client or end user, and you’ve been able to reproduce it. Of course, never seeing this in development, you are not sure as to why you are hitting this newly discovered failure path.  Maybe there was a missed edge, maybe a regression was introduced, or maybe a library was updated and there’s some compatibility issue afoot.  Morale of the story is now it’s time to figure out what is going wrong and why.
To accomplish this, look how far back the issue goes (is it also in the last release? was it always there?) is helpful.  You can use source control to figure out when the breaking change occurred and then look to see what the breaking change is.
You can also apply this methodology to logs.  Let’s say you see “normal” logs but then suddenly during the issue you see a part that is missing or incorrect.  Starting at that last point of success in the logs and walking forward can be very insightful.

Tracers

Tracers are not a bad idea for time sensitive issues that attaching a debugger may cause to not be present.  I’ve dealt with multi-threaded systems, and to pinpoint any rogue thread behavior, I have found that throwing as many log messages into the system during the debug phase can be helpful.  One you determine something like the system dies at spot x or thread b overlaps thread a at this point, you know what’s wrong.

Process Of Elimination / Isolationism

When I speak of process of elimination, I simply mean boiling down the system to the smallest working variety.  If that’s not possible, you have to start thinking about your last success point and then eliminating pieces of the puzzle.  If the mobile app is supposed to talk to the server but that communication is not happening correctly, is the mobile app not sending the message or is the server not receiving or processing it? With logs you can determine which side is not acting accordingly.
I’ve even had to use tools like POSTMAN to verify latency and that the server was processing correctly, isolating the issue to the hardware.

Metrics

Over recent months I have become quite fond of big data and the patterns you can spot from having a large set of users, devices, etc. reporting.  There can be strange one off errors, but when scaling a system and having it report details, it can be incredibly powerful to spot all users in case y have this happen.  

Remote Logging

Cases where it makes sense to do  this are to see remote events that otherwise you would not have knowledge of an issue. A good instance is adding remote logging to know if the user is triggering retry logic excessively so the system can be improved.  Alarms and timed events are another bit you can add remote logging for so I know they are in fact getting triggered. Be wary about overusing remote logging as reporting too much can make users uneasy.

Gather what information you can, examine the differences from the last time you know it worked, and add various checks to your project. It’s rare that an issue doesn’t require some sort of sleuthing to uncover a solution. Having some ideas and processes prepared in your toolbelt will make finding the solution to your next coding problem easier. What are your favorite tips for troubleshooting?

 

SEE OUR JOB OPENINGS