July 08, 2005

Testing Absence

Welcome to the fourth edition of "High-Tech Programming", in which we discuss some of the fleeting-edge concepts and problems of modern-world technological computer development. In this issue, I would like to take a time-out in order to step back and reflect on one of the nastiest sources of error in programming today, namely, the test for absence.

The test for abcense is the situation, occuring at least once in every program, object, module, function, method, variable and parameter, of making the computer do something depending on if something else exists or not.

There are numerous complexities to the concept of "absence", whence the problem. Testing if a value equals another value is generally easy, but testing whether a value exists is mind-boggling, and most languages are specifically designed to make it as incomprehensible as possible.

Say I want to test if my social security number is greater than π, for example. It's easy. I just go:

if mySocialSecurityNumber > π

and I'm done. That works for credit card numbers, phone numbers, PIN numbers, recipes, you name it. But I start to have problems when I want to test if my social security number exists. What do I do? This:

if mySocialSecurityNumber = 0


That'll work in QuickBASIC. Sometimes. But in C it will reset your stupid number to 0 and tell you everything's fine. That's because... who cares what it's because. No language even bothers to follow any sort of standard for this sort of thing. Testing absence, something upon which probably trillions of dollars of transactions and millions of medical records depend on every second, is a completely random operation. To know how to do it right, you have to:

  • know what language you're using,
  • know how that language represents data structures internally,
  • know how the language's declaration mechanism works internally, and, by extension,
  • know how that language represents memory access; you also have to know:
  • what operators are available for each data type in each language,
  • if and how those operators can be defined, redefined or overridden,
  • the behaviour of the operator once it's been overridden, and
  • @&$#@% Perl!
  • You also need to know the distinction between:
  • whether a value has been declared,
  • whether a value has been defined,
  • whether a value is empty, or
  • whether a value is null.

I think this is a lot keep in mind for a totally elementary operation. And it's not just a question of intellectual effort: sure, if you spent enough time thinking, you could order in your head which type of test is necessary when, depending on the combination of the above factors, but in doing so, I think you're entering into a process over multiple layers of abstraction of the computer language. Really, when a language provides you with an equals sign, it should be capable of working on one level.

This is not a joke. A language like C can be very handy if you're willing to keep track of on which level of abstraction your code is performing each operation, but when we claim to be moving towards more normalised, readable and maintable code, a language can easily make a clear distinction between processes at the level of memory, and at the level of, say, representing a bank customer. In other words, "Smith, Joe", has nothing to do with RX0DX.

No comments: