SQL Server Cardinality Estimation is a complicated topic until you understand the internals of SQL Server Statistics. Here are some quick notes that will answer some common questions that you generally have:
How does SQL Server Query Optimizer estimate a default cardinality?
Let’s see and example (turn ON Actual Execution Plan or press Ctrl + M)
use AdventureWorks2012 go -- Turn On Actual Execution Plan select * from sales.SalesOrderHeader where TotalDue = 1457.3288 go
You can observe that optimizer estimated 1.7 rows (approx) vs 1 actual row returned.
Let us take another example:
select * from sales.SalesOrderHeader where TotalDue = 472.3108 go
This time optimizer estimates 1.9 rows (approx) vs 21 actual row returned.
What’s going on? Answer is simple: SQL Server estimates the number of rows based on Statistics. Either statistics already exist on TotalDue column or SQL Server will automatically create one if AUTO_CREATE_STATS is turn ON. Either way, SQL Server uses this statistics object and more precisely, the histogram, to compute its estimation. You can read this blog post to learn more about histogram.
Now, let’s make some changes to our code:
declare @td float set @td = 1457.3288 select * from sales.SalesOrderHeader where TotalDue = @td
Estimation: 6.61864 (observe that the same code, same predicate value, gave us an estimation of 1.7)
Let’s change the variable value and run the same code again:
declare @td float set @td = 472.3108 select * from sales.SalesOrderHeader where TotalDue = @td
Estimation: 6.61864 (observe that the same code, same predicate value, gave us an estimation of 1.9)
Will the estimation be always 6.61864? What’s going on?
In the previous two examples, the optimizer was able to extract the value from the query to compute its estimation. In the next two example, since we are using local variables, the optimizer does not have the value until runtime. So will the estimation always be 6.61864 whenever we use local variable? Yes, if the table data does not change or more precisely, if the statistics behind the table does not change. When does statistics change, etc, etc is another blog post. But let’s come to the main question, where does 6.61864 come from? How does the optimizer compute this default value?
The formula for this default estimation is:
Total Rows * Density (where Density is 1/number of distinct values)
SQL Server, apart from marinating the histogram also maintains density information for the columns.
So let us compute this:
select count(*) from sales.SalesOrderHeader select count(distinct totaldue) from sales.SalesOrderHeader
Density = 1/4754 = 2.103491796381994e-4
Total Rows * Density = 31465 * 2.103491796381994e-4 = 6.6186….
select count(*)*cast((1.*1/count(distinct totaldue))as decimal(18,17)) from sales.SalesOrderHeader
That’s how the default estimation was computed in this case. Hope the quick note was helpful.
Want to watch this in action? Watch here!