Can a software testing tool be trusted to test itself? A toolsmith answers.
Building tools for software builders sets the bar high for quality. If compilers had as many bugs as typical applications, we couldn’t get anything done. I’m sharing my insights on building a software that tests the functionality of other software: Usetrace.
Pulling yourself up by your bootstraps
First compilers were built by handwritten machine code because there were no high-level languages to use. The Naval Electronics Laboratory developed the world’s first self-hosting compiler in 1958 by writing the first simple version in assembly language and then by re-writing it in its own language using the bootstrapped compiler to re-compile itself. After the tedious bootstrapping the development of compilers and new computer programming languages became easier and eventually led to the languages used today.
Back to the present. Compilers are typically self-hosting, meaning they are used to compile new versions of themselves. The same can be said of software testing tools. Testing tools are trusted to test themselves. For example, Selenium Webdriver has extensive automated tests written in Webdriver. The same goes for the tool I am building. We have currently automated the 41 most critical use cases as automated UI regression tests by using the testing tool itself. We also use a subset of the regression UI tests as live/production tests running on 1-minute interval.
But can a software testing tool really test its own reliability? Obviously there are serious problems: a regression testing tool can report false positives on its own functionality. A monitoring tool monitoring itself does not really work either. Mainly because along with the previous problem, it’s also possible that the monitoring loop can halt and/or the reporting functionality may break, in which case there will be no alerts on the meltdown.
In reality software testing happens at multiple levels so there’s never one technology responsible for all of the testing. Before a release: unit testing, integration testing, API testing, UI testing. After a release: smoke testing and monitoring. There’s also the human component, exploratory testing, the insanely expensive but most efficient way to catch new and unexpected problems. If one of the levels fail, other levels are supposed to catch critical problems.
Monitoring monitoring
A monitoring tool’s core task is to keep checking that predefined resources respond continuously as expected and raise alerts when something unexpected happens. We continuously monitor Usetrace by running Usetrace against itself. We get an email right away and SMS when any check has failed for 5 minutes consequently. If something drastic happens in the Usetrace testing infrastructure, the monitoring loop would halt. It’s also possible, and it has happened, that the 3rd party integrations used for alerting, such as AWS Simple Email Service and/or its client libraries have problems.
To find out when our own monitoring loop has problems, we’re relying on Pingdom. We have implemented an API for checking that a certain monitoring task has been run without problems with the given time window. Pingdom polls this every minute. From this, we’re able to deduce that 1. Our monitoring service is up. 2. Our monitoring loop is active. 3. Our reports are being sent.
If partial system regression happens, our internal monitoring system fires up. If the system is too broken to be able to notice and notify its own state, we rely on a 3rd party to notify us about this.
Final word
Returning to compilers. They have really become invisible to regular developers in recent years. Most software today is written in interpreted languages where there’s no separate compilation phase. Their technology is still in use behind the curtains. Maybe in the future also the bulk of the work software testers do today is becoming invisible and blend to that basic technology fabric that just works without having to worry about it.