The phrase Sargability is quite popular in SQL Server world.
Wikipedia says: “In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE.”
Does this mean “the ability of the SQL Server engine to use the search argument to perform an index seek operation?”.
Yes, kind of. Let’s understand with some examples.
Execute the following queries.
-- turn on actual execution plan use AdventureWorks2016 GO -- turn on Statistics time and io on SET STATISTICS IO ON --SET STATISTICS TIME ON -- sargable select LastName from PERSON.Person where LastName = 'Murphy' -- not sargable select LastName from Person.Person where UPPER(LastName) = 'MURPHY'
Let’s observe the execution plans for both of them. (There is a non-clustered index on LastName column if it was not obvious already).
Two things will catch your immediate attention.
#1 The cost factor. 3% vs 97%
#2 Index Seek vs Index Scan
But these differences would not have been evident only if either one of the queries would run. You would prefer to have the first over the other. At least, in an ideal world which is never the case. So, the first quick learning here is: Always compare multiple variations of your T-SQL code. Well this is a PRO Tip, not for the lazy ones.
Anyway, let’s move forward. So, what went wrong?
Well, this is a classic example of Sargability. In the first case, the predicate in sargable. The storage engine is able to compare the literal/constant (user defined value) directly to the column value contained in the index record. And the resolution (the match) occurs. Because of this ability to perform a direct comparison, SQL Server is able to use the Seek operation (the access method). In the second case, a function is applied on the column, so before a comparison can be made, first, all the column values need to be converted by the execution of the function and then only comparisons can be made. Therefore, the storage engine decides to use the Scan access method because it has to touch every record. This results in higher IO, higher CPU – expensive plan, as clearly visible by observing plan cost.
Let’s go the messages tab and pull out IO costs.
You can clearly see that much more work was done by SQL in case of scan.
My understanding is that both the predicates were Sargable in nature. Both predicates were identical =’Murphy’. It was the application of the function on the left side of the operator that spoilt the show. Let me elaborate.
Consider these two examples:
-- sargable select LastName from Person.Person where LastName like 'Murphy%' -- not sargable select LastName from Person.Person where LastName like '%Murphy%'
If we look into the execution plans and IO costs, we will see a similar pattern as before.
And similar pattern with cardinality estimation also (not putting the screenshot here).
So, what went wrong this time? There was no function applied on the column? Well, this time the predicate itself was non-sargable. The second one. We applied a wild card character and that made the query go really wild. So, with a leading wild card character the engine cannot perform a seek operation as we are only trying to match a part of the string, and to do that SQL has to touch every record and try to find a match. Remember the index on Lastname column is sorted in ascending fashion (the default). And seeking happens from left to right. Makes sense?
But anyway, in both cases, we just say that the query is non-sargable. Period.
Well, the purpose of this blog post is not to explain sargability in-depth, though, I tried to touch on some key observations and tried to provide a different angle. SQL community has written much better blogs on Sargability.
I want to seek your attention on some other aspects.
One immediate conclusion people derive is that SEEK is always better than SCAN. In most cases, that may be true. But it all depends on the amount of work SQL Server has to do. And the amount of work is actually defined by you, by your predicate. (You seem to be quite happy that you are assigning work to SQL Server, you are the boss).
Let’s take this example now. SalesOrderDetail has a column LineTotal, lot of numeric values. Let us create an index on it.
-- lets take another table, another column select * from sales.SalesOrderDetail -- create index create nonclustered index ix_linetotal on sales.salesorderdetail (linetotal)
This is how the non-clustered index looks like, right?
-- this how index looks like, data sorted select LineTotal from sales.SalesOrderDetail order by LineTotal
Data is sorted in ascending fashion, the default.
Now, let’s write our queries, sargable and non-sargable, both versions.
-- sargable select linetotal from sales.SalesOrderDetail where LineTotal > 23020.131792 -- non-sargable select linetotal from sales.SalesOrderDetail where abs(LineTotal) > 23020.131792
And you will observe a similar pattern. Seek vs Scan. Less IO vs more IO. Less cost vs higher cost.
And now you know very well what is going on. You know why this is happening. Hey, why is there a Compute Scalar operator there? It wasn’t there in the previous example? This is because LineTotal is a computed column. Compute Scalar has to compute the formula defined for that column. You can script out the table and see it. Compute Scalar is interesting. Back in the days (SQL 2000 for example), Compute Scalar was used to compute the resultant value after implicitly converting the literal to the required data type. And it used to show up in the execution plan. Now it does not show up as a separate operator but the operation is still performed. We can hover the mouse over Index Seek/Scan and observe this.
Let’s discuss the cardinality factor a little more. Because of non-sargability, the storage engine is reading many more rows than actually needed. That is not OK but still OK, but the bad estimation there is a bit worrying. For scan, it estimated 36K and actual, just 7 records. For Seek, it’s 111 records estimation and actual 7. What if you ask the optimizer to sort the data? Are you getting the hint? No? Let’s try this…
-- create index create nonclustered index ix_linetotal_qty on sales.salesorderdetail (linetotal,orderqty) -- sargable select linetotal from sales.SalesOrderDetail where LineTotal > 23020.131792 order by OrderQty desc -- non-sargable select linetotal from sales.SalesOrderDetail where abs(LineTotal) > 23020.131792 order by OrderQty desc
Observe the following:
#1 Warning on the SELECT operator in the scan plan
#2 Cardinality estimation for Seek plan (number of rows going into the Sort operator)
#3 Cardinality estimation for Scan plan (number of rows going into the Sort operator)
Below screen shot summarizes all.
So, what’s going on?
Because of bad estimation (the result of non-sargability), in the scan plan, SQL estimates 36K rows would go into the sort operator and so it reserves 6448 KB memory for the sort operation. This reservation is done by a specific memory clerk during optimization phase. But in actual number of rows are just 7. The warning clearly calls this out “ExcessiveGrant” and lands up using only 16 KB, what is really needed. Hover over the sort operator and observe 7 rows. In case of seek plan, the estimate is 111, and that is much better, therefore, no excessive grant. Look at the sort operator in the image above for seek plan, it says, 7 of 111 (actual 7 rows of estimated 111). If you further investigate in the properties window, you will see that even seek plan landed up using 16 KB only.
Again, sorry for digressing again, but it was important to show the side effects of non-sargability.
-- drop the index drop index ix_linetotal_qty on sales.salesorderdetail
So, this is a reaffirmation that seeks are better than scans. But hold on, do not jump to conclusions.
So, what exactly is Seeking? Seems to be some kind of magic. Drastically low IO and extreme performance! Well, it all depends on the predicate value, the literal you specify, the constant, the parameter – whatever you call it (programming world is full of terminologies).
Let’s dissect a bit more.
The literal we specified resulted in a highly selective result set. Only 14 records out of 121,317 records. When SQL seeks, it goes to a particular position and starts reading the data from that point. Remember, the data is sorted because there is an index on that column. So, the storage engine goes to the first occurrence of 23020.131792 and starts reading all the records from that position. So, if I may say so, this is like Seek + Scan or Range Scan? And 23020.131792 is a high value, far away at the end with a few pages, so the number of records is just so less, IO (page reads) has to be less. What if we change the literal to 2.290000? Now, the storage engine will seek the first occurrence of 2.290000 and start reading all the records from that point. And you can expect that it will read a lot more pages now as 2.290000 is somewhere in the beginning set of pages. In contrast SCAN will always touch all the pages, irrespective of the value we specify.
Now think about it. A lot depends on the operator and the literal we specify. So, SEEK could be as expensive as SCAN. A lot also depends on the data distribution you have got in the column values – symmetrical or asymmetrical, or the extent of skewness in asymmetrical distribution of data.
Let’s run this now.
-- sargable select linetotal from sales.SalesOrderDetail where LineTotal > 2.290000 -- non-sargable select linetotal from sales.SalesOrderDetail where abs(LineTotal) > 2.290000
Let’s observe the execution plan and IO stats.
On another note (digressing a bit again), the predicate and the selectivity pattern plays an important role. The idea is simple. If the selectivity is quite high, SQL may choose SEEK. High selectivity means a smaller number of records. And if the selectivity is low, which means more records, SQL may choose to SCAN. This is a cost-based decision by the optimizer while estimating the number of rows, and for this estimation the optimizer is heavily dependent on statistics. So, thee is a tipping point beyond which SQL decides to use a different access method. It’s like; if you have to travel a few meters, you may prefer to walk. But if you have to travel a few miles, you may prefer a vehicle. So, there is a tipping point there beyond which you may not prefer to walk. Yes, you can run also 😉
Coming back to Sargability. All is good till now dealing with character and numeric data. Life changes when we are dealing with datetime data and the commonly used functions, CAST and CONVERT.
Before we move forward, please clean up.
-- clean up drop index ix_linetotal on sales.salesorderdetail
Now, let’s work with the OrderDate column in SaleOrderHeader table and create an index on it, very similar to what we have done earlier.
-- lets work with datetime -- check indexes on the table salesorderheader sp_helpindex 'sales.salesorderheader' -- orderdate column select OrderDate from sales.SalesOrderHeader -- create index on OrderDate create nonclustered index ix_orderdate on sales.salesorderheader (OrderDate)
Now, let’s see a one example of non-sargable query and it’s sargable counterpart. I will not put up screenshots, as the examples are quite simple and similar to the ones above.
-- non-sargable select OrderDate from sales.SalesOrderHeader where year(OrderDate) = 2011 -- sargable select OrderDate from sales.SalesOrderHeader where OrderDate between '2011/01/01' and '2011/12/31'
Quite simple, get me all the orders for 2011. The non-sargable scans, the sargable seeks. Or you could write this way too, it’s all the same.
--sargable select OrderDate from sales.SalesOrderHeader where OrderDate >='2011/01/01' and orderdate <='2011/12/31'
Similar to the above, there can be dozens of examples where date & time functions can be used to crush the sargability of the query.
Now, lets see an interesting behavior with CAST & CONVERT functions. First, CONVERT.
-- non-sargable select OrderDate from sales.SalesOrderHeader where convert(char(10), OrderDate,121) = '2011-05-31'
Any surprises? Not at all. We applied a function on the column, left side of the operator, and the result is same as all previous examples we have seen before. Check the execution plan, it is scanning the index, not seeking on the index.
But now look at interesting behavior. Run the below two queries and check execution plans.
-- sargable select OrderDate from sales.SalesOrderHeader where convert(date, OrderDate,121) = '2011-05-31' -- sargable select OrderDate from sales.SalesOrderHeader where cast(OrderDate as date) = '2011-05-31'
Hurray, now suddenly we have a sargable query. You can check IO stats too, it read only 2 pages. Well, consider this as an exception. Two common ways of explicitly converting data types in SQL Server is CAST & CONVERT. Convert is SQL Server implementation and CAST is ANSI SQL standard. Convert also provides additional formatting capability. As it turns out, when you use these functions on the column (on left side of the operator) and you are converting datetime to date or vice versa, SQL internally ensures that sargability is maintained. Well CAST is internally implemented as CONVERT only, so the behavior is same. But if OrderDate was converted to, say varchar, sargability would be lost. So, what you are seeing here with CAST & CONVERT is an exception.
Does this mean that we can use CAST & CONVERT without thinking about Sargability? Not really. Stick to the best practices and guidelines. These functions internally perform Dynamic Seek and that may not always help good cardinality estimations in contrast to the full range queries with literals . In our example here, we cannot see that. Some other time.
With this exception, the guidelines/best practices, etc, do not change. You want your predicates to be sargable, you want your queries to be sargable.
Thanks for reading. If you liked the content, do share it. I am on Twitter if you wish to connect: https://www.twitter.com/A_Bansal
You can also subscribe to my exclusive newsletter ConnectWithAB – https://sqlmaestros.com/stay-connected-with-amitbansal/
Very detailed !