Heisenbug - A Tester’s Nightmare!

How do you react when your developer says, “Well, I can’t see any crash! It runs just fine in my system”, in response to a bug-report, you had submitted recently? Worst still, he goes on to send back your bug-report with a resolution something similar as “Can’t Reproduce”! As far as I am concerned, I have seen such replies quite a number of times in my testing career. And trust me, this can be quite frustrating for a tester. Things can turn dirtier if your development team is based in a location different than yours. It can be frustrating due to number of reasons:

1. Your developer finds the crash hard to reproduce while you are getting the crash easily in your test machine.
2. You can get the crash in your teammate’s (tester) system by following the test steps.
But the developer says he doesn’t see anything awkward after following your steps!
3. Every time, you “Reopen” the bug with a comment telling that you are still able to get the crash, the developer keeps reverting it back with a resolution “Can’t Reproduce”!

These are *testing times* for a *tester*. These are times when you start doubting the credibility of the developer. And start wondering if (s)he is doing this purposefully! When you want to shout at top of your voice and show the developer the crash screen when it crashes in your system. When you want to send hate emails to the developer to let out your frustration.

But doing such things may jeopardize your testing career! So a better option, in my opinion, may be is to try and find out the reason why the developer is not finding the crash and act accordingly. Few important context-driven questions that we usually forget to ask in such situations are:

1. Why is the developer saying that (s)he is not able to get the crash? Why is (s)he repeatedly turning back my bug-report with a “Can’t reproduce” resolution?
2. Is (s)he really unable to reproduce the crash? If yes, what could be the possible reason?
3. Does my
bug-report contain clear and concise steps to reproduce the crash? Does it contain all the details required to get the crash? Is something lacking in it? Have I missed to include some important pre-conditions/test data required to generate the crash in my report?
4. Why is the crash so easily reproducible in my (and other tester’s) system, and yet so tough to reproduce in the developer’s machine?
5. Is the developer also using the same build as mine while trying to reproduce the crash? I
s (s)he using a different build from the VSS while trying to get the crash?
6. What might be the major difference between a tester’s machine and a developer’s machine (apart from the possible differences in hardware configuration alone)?
7. Similar questions that suit best to your context.

If you are able to ask such questions at right time, chances are great that you may get answers to the problem “Why the developer is not able to reproduce the crash”. Instead of blaming the developers for rejecting our bug-reports with a “Can’t Reproduce” tag attached to it, we should rather spend our energy on finding the cause for it. Makes sense? Read on.

The last time, when I was trying to figure out why the developer was not able to reproduce my crash, I found an interesting answer! The crash, which I had reported turned out to be a heisenbug! Confused by the strange terminology? Even I was also equally confused when I heard this term for the first time! A heisenbug is a computer bug that disappears or alters its characteristics when one attempts to probe or isolate it!

In our case, it turned out that the use of a debugger in the developer’s machine was altering the program's operating environment significantly enough that the buggy code (which was a result of uninitialized variables) behaved quite differently from the release-mode build (compiled) that I was using. Execution of the program in debugger-mode was clearing the memory before the program starts and was forcing the variables onto stack locations, instead of keeping them in registers. To make things worst, the debugger was provided with user interface that was changing the state of the program while the code was executed!

This was the reason why my developer friend was not getting the crash in his system running in debugger-mode. And I was getting the crash almost with ease, as I was using the release-mode build! After finding out this, it was not a big task for the developer to fix the crash. Had we not discovered this crash to be a heisenbug, I might have been forced to “Close” the issue one day, without ever knowing the reason why the developer was struggling to find the crash. Worst still, as it was a crash in a basic functionality of the application, chances were high that the end users would have faced it too!

Well, this is one type of heisenbug, which occurs in a release-mode build (compiled) but not when investigated under debugger-mode. There can be another types of heisenbugs, which are usually caused by race conditions, the effect of a pointer running out of bounds etc. This term (heisenbug) is a play on Heisenberg's Uncertainty Principle from Quantum Physics, which implies that the act of measuring a particle (such as bouncing a photon of light off it) also affects the particle. In other words, the very act of observing something affects what is being observed, making experiments and the like difficult since simply looking at what’s happening changes the result!

So next time, a developer reverts back a bug-report with a “Can’t Reproduce’ tag, we should be careful to investigate more on it and see what might be restricting the developer from finding the problem. If you had any similar experience, do share with us by leaving behind a comment.


Happy Testing...

Share on Google Plus

About Debasis Pradhan

Debasis has over a decade worth of exclusive experience in the field of Software Quality Assurance, Software Development and Testing. He writes here to share some of his interesting experiences with fellow testers.

3 Comments:

  1. Nice post.. It is important to treat not reproducible commnets with suspicion, there could be hidden reasons as you mentioned above. What we had in our team is an agreement that before making any defect not reproducible, developers need to test it on the clean VM image with same version of software and not on dev environment. We found that after introducing this simple policy, we reduced not reproducible sigificantly.

    ReplyDelete
  2. u r right debasis i too have experienced such situations while i work but could find out the exact teminolagy from your post thanks..

    raji

    ReplyDelete
  3. Interesting I have seen this on numerous occasions and have found that using a system dump tool such as userdump on the offending module when it crashes most useful to pass to the developers.

    ReplyDelete

NOTE: Comments posted on Software Testing Tricks are moderated and will be approved only if they are on-topic. Please avoid comments with spammy URLs. Having trouble leaving comments? Contact Me!