Here is a short explanation of the SQL Server statistics histogram. I am using AdventureWorks2008R2 database. There is a table Sales.SalesOrderHeader with a column TotalDue. Let us have a look at that column.
USE AdventureWorks2008R2 GO -- first observe the rows select TotalDue from Sales.SalesOrderHeader GO select DISTINCT TotalDue from Sales.SalesOrderHeader GO
You will observe that out of some 31,000+ rows, there are some 4700+ distinct values – good data to demo statistics histogram.
I’ll go ahead and create a statistics object on this column.
CREATE STATISTICS st_TotalDue ON Sales.SalesOrderHeader(TotalDue) GO</pre> Now, let’s observe the statistics histogram. <pre class="lang:default decode:true">DBCC SHOW_STATISTICS(N'Sales.SalesOrderHeader', st_TotalDue) GO
The histogram output is as follows:
Let us understand this metadata.
As the name implies, Histogram divides those 31,000+ values in TotalDue column into various buckets. The RANGE_HI_KEY columns shows you the boundary values of those buckets. More precisely, each value there is the upper boundary value of a histogram step. The column RANGE_ROWS shows the number of rows where the TotalDue value is between two RANGE_HI_KEY values (mean between 2 boundaries). For example, there are 2 records between 2.5305 and 4.409; meaning there are 2 rows where the TotalDue column value is between 2.5305 and 4.409 (excluding the boundaries). Let us verify this.
select COUNT(*) from sales.SalesOrderHeader where TotalDue>2.5305 and TotalDue<4.409
The EQ_ROWS shows the number of rows where TotalDue column value is equal to the RANGE_HI_KEY value. For example, there are 139 rows equal to 2.5305; meaning there are 139 rows where the TotalDue column value is equal to 2.5305 which is a boundary value. Let us verify:
select COUNT(*) from sales.SalesOrderHeader where TotalDue=2.5305
The DISTINCT_RANGE_ROWS shows the number of distinct records between two RANGE_HI_KEY values; meaning the number of records where the TotalDue column value is unique between RANGE_HI_KEY values. For example, there is one distinct value between 2.5305 and 4.409. Let us verify:
select DISTINCT TotalDue from sales.SalesOrderHeader where TotalDue>2.5305 and TotalDue<4.409
You can see that there is only one row output. Recollect that there are only 2 rows that falls between this range – remember RANGE_ROWS?
Which means, these 2 records should be identical considering TotalDue column value? Let us verify:
select TotalDue from sales.SalesOrderHeader where TotalDue>2.5305 and TotalDue<4.409