Keeping Your Test Data Clean

There are many things one can learn from  this story:

Symantec Corp.’s early-warning system gave its enterprise customers a brief scare late Friday when it erroneously sent an alert that said an Internet-crippling attack was in progress.

However, the one I want to tease out is this lesson: always keep your junk data clean and readable and clearly defined as test data.

In the Symantec alert, the message sort of indicated that it might be a test:

In the body of the e-mailed alert, however, careful readers found the words “Summary: threatcon test threatkhanh otrs” buried among several links.

Whenever you’re testing in a live environment or in an environment where databases are copied from dev or test environments into production (not a best practice, I know, but it happens too often), you must structure your test data so that it’s marked as test data somehow.

For example, on Web data collection forms that use e-mail as a uniqueish identifier, I recommend using a standard domain for your test submissions; in the past, I’ve used the company’s domain.  This way, your dev teams can (and should, but whether they will or not depends upon the caliber of your persuasion–I recommend .45) filter out the test records before the records go to the fulfillment house, e-mail vendor, or mere usage reports.

On free-text sorts of data entry, it’s best to stick with a simple line of “This is a test entry.” or something similar.  At least Mr. Khan did enter the word “test” in his data entry for Symantec; other times, developers or testers pound on the keyboards with the subtleties of bonobos.  If these strings of characters get passed onto actual users, they might look like defective behavior on the part of your application.  Worse, if you subversive QA types use gallows humor in your test entries, it might make it past the gatekeepers and onto the end user or client’s computer screen.

I’m not saying that I ever did anything like that, regularly-scheduled production testing included some feeble humor that the data aggregation and screening person filtered out before passing relevant entries onto the client.  When the data aggregation and screening duties passed on to admin staff, one of the almost plausible punchlines made it to the client.  The humor wasn’t anything that damaged the relationship with the client, but it’s better to be safe than sorry.

So treat that test data as though your client or end user was going to read it.   That way, when your tech team inevitably makes the mistake that presents test data as gospel, at least it will be good spelled.

If you use automated testing, make sure that the test data used by your scripts is clean and friendly, too.  Because if your client gets to see that information, your client is going to see it a lot.  Automated test data that indicates its source in automated testing will explain to the uninformed why he or she is seeing this test message every day at 1am.

It’s not a bad idea to earmark load testing data as well so that its origin is easily identified; a load test mistakenly sent to a production server should trip some potential Denial of Service alarms with your hosting provider.  If you’re doing it the right way, that is.

Comments are closed.

wordpress visitors