A very common issue that I see on forums is that SQL Server NOT IN clause not working. Here is a quick repro:
Note: These are dummy tables just for a quick demo so please do not try to find the logic between the two tables 🙂
I create a customers table with custid and custname columns. I insert three records in it.
USE tempdb GO create table tbl_customers (custid int, custname varchar(50)) go insert into tbl_customers values(1,'A') insert into tbl_customers values(2,'B') insert into tbl_customers (custid) values(3)
Note that a NULL gets inserted in the third record for custname columns. This is intentional.
This is what the customers table looks like, note the NULL there.
I create another table: Orders table with two columns, ordered and custname and insert three records.
create table tbl_orders (orderid int, custname varchar(50)) go insert into tbl_orders values(1,'A') insert into tbl_orders values(2,'B') insert into tbl_orders values(3,'C')
Note that in the third order I insert a customer name ‘C’ that does not exist in customers table. (I told you not to find the logic, just understand the technical issue)
This is what the Orders table looks like:
Now my final query: Give me all the customer names from Orders table that do not exist in customers table.
select custname from tbl_orders where custname NOT IN(select custname from tbl_customers)
And I get an empty result set.
I was expecting ‘C’ to be returned.
The technical issue is:
NOT IN clause in SQL Server is nothing but a series of NOT EQUAL TO. One of the values from the subquery is a NULL. The result set (custname from tbl_customers) contains A, B & NULL. Every value from the outer query is compared with every value from the inner query. And when the comparison is done with NULL, it returns UNKNOWN because NULL is an unknown value and anything compared with unknown can only result in unknown. And UNKNOWN in this case of Logical comparison is treated as FALSE and an empty result set is being returned.
Solution 1: Put an additional filter.
select custname from tbl_orders where custname NOT IN(select custname from tbl_customers where custname is not null)
Solution 2: Use Left Join
SELECT t.custname FROM tbl_orders t LEFT JOIN tbl_customers c ON t.custname = c.custname WHERE c.custname IS NULL
Solution 3: Use NOT EXIST clause
select custname from tbl_orders o where NOT EXISTS(select custname from tbl_customers c where c.custname = o.custname)
Solution 4: Using EXCEPT operator
select custname from tbl_orders EXCEPT select custname from tbl_customers
All the solutions give you this output:
Note that EXCEPT operator behaves a little differently. If there were two instances of C, it will only show one instance. It applies a DISTINCT SORT and removes duplication.
Now, if you run all four of them and check their execution plans, the first three seem to perform equally well. However the EXCEPT solution is expensive because of the DISTINCT SORT.
So choice is yours and what you expect from the resultset!
And that’s why it is very important to know your data well. You should be knowing whether the target column allows NULL or is it NULLABLE or not 🙂
Want to watch this in action? Watch here!
You should always use “not exists” for this kind of selects instead of “not in”.
My approach for this kind of query is that i would use a left join
FROM tbl_orders t LEFT JOIN
tbl_customers c ON t.custname = c.custname
WHERE c.custname IS NULL
Or use a left join
Use a left join and the nulls don’t matter
select o.custname from tbl_orders o
left join tbl_customers c on o.custname = c.custname
where c.custname is null
Thanks Toni, Aldo, Andy, Duncan for the comments – the blog post is updated with all solutions (possibly) 🙂
Use Isnull operator to overcome the null operations.