Race issues – application crashing and user interaction

Today was amazing. I’ve been in the office again, had some really promising chats with team members and everything looked good. I was short before getting back to the airport to catch my flight back to Düsseldorf, when…

Murphy was there. Pretty much visually. He reincarnates in a log file like this:

2014-11-18 09:48:34,236 [6] FATAL FatalExceptionManager FailedToCloseDocumentException: While trying to close document xy.xlsx the application got unresponsive.
2014-11-18 09:48:34,236 [7] FATAL FatalExceptionManager FailedToCloseDocumentException: While trying to close document YZ.xlsx the application got unresponsive.

Let’s explain that first: This is an application that is capable of running multiple Excel instances side by side as a service. Excel is a beast (it is, really) so there is strong handling of exceptional situations. In this case the program considers two Excel applications controlled by two different threads as being unresponsive due to it waited a configurable time. When being considered like this, the Excel process will be killed to ensure proper processing of all jobs in the queue. That “normally” works like a charm. But..

2014-11-18 09:48:34,236 [7] ERROR ICommandExecutor – ExternallyStoppedException: Process has been externally stopped.
2014-11-18 09:48:34,236 [6] ERROR ICommandExecutor – ExternallyStoppedException: Process has been externally stopped.

As the system is controlled by a web page and Excel can probably run on a different machine, certainly the user has the capability to stop the whole processing. This then leads to ExternallyStoppedExceptions.

This also works like a charm… normally. Have a look onto the millisecond of the logging messages. This is where Murphy was there. He pressed his finger on a single line of code that wasn’t wrong, but in the wrong position.

Due to the user pressed the big red button to stop the processing, the actual command that kills the Excel process was skipped. That led to a pretty drastic behavior. And the fix?

Move a single line of code 5 lines below to ensure the exception in question is raised after processing the command, not before.

Let’s say this is pretty hard to reproduce.
Guys, Murphy lives.

Holger