February 19, 2018
By: Jeff Fowler
A Case Against Using Nulls in Marketing Databases
Most marketing databases are built using a relational database management system. That means somewhere, buried deep beneath all those sexy screens in whatever application you choose to employ, lurks a confusing and cantankerous beast known as Structured Query Language. This is more commonly known as SQL. Originally invented by IBM in the 1970’s, SQL has become accepted as the industry standard database access language. For marketers, it is one of the few four letter words in existence with only three letters.
My case in point for today is the SQL term used to describe the absence of data: null. According to relational purists, if you do not have data for a field in your database, then that field contains a null. Null means nothing, empty, zippo, nada. It does NOT mean blank (a hard space) or zero. So, if you think your checking account balance became null after Christmas, you’ll be relieved to know that this is not so. Purists will argue that if a field is blank or zero it means you DO know something about it, but if it is null it means that you don’t. Post-holiday spending, you will be happy to hear that zero only means that you are out of money. Less than zero means you are in trouble. Null means you don’t even know if you have a checking account, or worse yet, a bank.
The problem with nulls is that they do not jive with what most marketers want or expect in terms of answering marketing questions. According to SQL, a null value cannot be tested against anything but the keyword NULL. So, let’s say you have a field called gender in your database and it contains three values: B for businesses, F for female, and M for male, as well as null when you don’t know. Let’s further suppose that you want to run a campaign targeting every customer, except for the ones you know are businesses. You might think you can do this by entering something like gender <> ‘B’ (gender not equal to ‘B’). Wrong! Guess what? Null is neither equal to nor unequal to B. So, to get what you want, you have to say gender <> ‘B’ OR gender is null. Get it? Me neither.
Another confusing fact is that null fields do not equal each other; for example, field1 = field2 fails if either or both fields are null. This is in direct contrast to what Mr. McAlpine taught me in my two years of 7th grade Algebra: if A = B and B = C, then A = C. With due apologies to Mr. McAlpine, when fields contain nulls you have to ask the question this way:
field1 = field2 OR field1 is null AND field2 is null
Finding the Formula
If you’re still reading this article, we are about to get to the fun part. You may notice in both scenarios that I have described so far, to get correct results I had to add an OR something is null condition. Therein lies a subtle but nasty problem: the way OR’s are evaluated inside databases. Just for fun, let’s put both of our examples into a single query: gender <> ‘B’ AND field1 = field2. Based on what we have discussed thus far, you have learned that to get the correct answer you need to ask the question this way: gender <> ‘B’ OR gender is null AND field1 = field2 OR field1 is null AND field2 is null
And the answer is… Wrong again! With SQL databases (as well as programming languages), AND conditions are evaluated first, followed by OR conditions. Here is how the database interprets your question, with parenthesis added for clarity: (gender <> ‘B’) OR (gender is null AND field1 = field2) OR (field1 is null AND field2 is null)
As you can see, this is clearly not what we thought we were asking. Unfortunately, the database does not care what we mean. The database obnoxiously does what we say instead. To get what we want, we must ask it this way: (gender <> ‘B’ OR gender is null) AND (field1 = field2 OR field1 is null AND field2 is null)
The Final Say
Here’s my inflammatory anti-relational theory advice for today: now that you know how not to ask for nothing (or not nothing), don’t do it. Do not use nulls in your marketing database unless you truly feel that you must. Instead, use blank for character fields (like gender) and zero for numeric fields. Trust me, ninety-nine percent of the time you will get what you want by doing this. In over 25 years of designing marketing databases, the only area where I have seen a need for using null values is with date fields. While you can assign a default value for dates (we like to use January 1, 1900), it seems bad form to store a deceased date for someone who is still alive. On the bright side, it does cut down on your mail volume a lot if you exclude people with a deceased date and everyone in your database has one.
Some of the more technical readers know that SQL omits null values from calculations and may object to filling null numeric fields with zero due to the impact it has on arithmetic. For marketers, this most often translates into computing sums and averages of dollar amounts, because by far the most common fields that we like to total, or average involve money. If we don’t know how much money somebody paid us for something, one might argue that assuming it is zero will have an adverse impact on computing their average purchase. While this is true, my advice is that if you want to get accurate answers, you can never truly escape having some knowledge about your data. If there are certain items (such as premiums) that have no cost, you are better off excluding them specifically than you are relying on them being automatically excluded, because they are null. And it is a heck of a lot more intuitive to say price <> 0 than it is to say price <> 0 and price is not null.
In closing, remember that in the battle of good versus evil, there are two kinds of fields in an SQL database: those that allow nulls and those that don’t. You are better off sticking with those that don’t.