Testing With Real Data

An article at Dark Reading explains Real Data in App Testing Poses Real Risks:

If you use real, live customer data in your testing and development of applications, you may want to think twice about the risks of exposing that data.

Organizations that use live data in their testing do so basically because it makes the testing more real-world and better puts the app through its paces. Trouble is, it also can expose sensitive data to engineering staff who normally wouldn’t have access to that data, as well as to consultants and other outside contractors working with your organization on the testing process.

But you don’t have to use the real thing in app testing and development: “It needs to be real enough, but it’s better if it’s not people’s confidential information,” says Gary McGraw, CTO of Cigital.

Still, it’s common practice among many organizations today. According to a new study from the Ponemon Institute, which was commissioned by Compuware, 69 percent of the over 800 IT professionals surveyed said they use live data for testing their applications, and 62 percent say they do so in their software development. Over 50 percent outsource their app testing, and of that group, 49 percent of them share live data with the outsourcing organization.

The article conflates using real data with using live data, but it’s really two different things, both of which comes with its own risks.

I know that the risky behavior that the article talks about, exposing real data for development and testing, occurs frequently; I have worked on a multitude of projects where the technology team gets its verisimilitude by copying a snapshot of the current, real, production database into a development environment or a testing environment. This has a number of risks and problems:

  • These environments usually aren’t as secure as production environments and might not be as well monitored as production environments.
  • The environment isn’t usually as robust as a production environment, so the developers will blame performance problems on anemic machines processing enormous databases instead of, perhaps, problems with their impeccable code.
  • Unvetted users can get access to the client data, including project managers, testers, and anyone drafted in the office for that last minute panicked testing push.

As bad as that is, the worse of the two occurs when QA is asked to test not only real data, but also in a live environment. I’ve worked on projects where the code is promoted hastily directly into production and the powers that is direct QA to test the application in production, where the data is live and any cool things QA could do (such as bringing down the horse) would be very, very bad. Testing an application, particularly testing an untested application when it’s live come with its own set of risks:

  • Any defects found can have an adverse impact on the actual users and the system instead of a nice, safe development environment.
  • Any defects not found but triggered can muck up real data without anyone catching it, which will impact other people’s lives.
  • Test data entries will exist in the live system, impacting any functions that aggregate the test data with real data. Someone might find Ace Frehley that lives on 1976 Kiss Avenue and come to question the veracity of all data in the system.

Of course, the the risks in testing with real data or live applications and data aren’t the only possible combination of disasters waiting to happen. You could get configuration problems where your test environments point to live data (that is, someone has entered the wrong database connection information in the stage environment) or your live applications can point to your test databases (that is, someone pushed the code into production along with connection information to the test data) or a host of myriad other potential problems when dealing with SOA or service-based application where instead of one database and one application, you have many.

The moral? Be very attentive to the data and the environment in which you’re testing, and when the developers say it’s ready, have some way to check to ensure it’s ready, whether it’s verifying the connections and database names yourself or using only test data you enter yourself.

The result? Your humble narrator continues to struggle with the fear that somewhere, somehow, some intern is going to refinance my mortgage at an interest rate of 999.99% while performing a standard boundary analysis sort of test in a live environment.

Comments are closed.

wordpress visitors