Hi Friends,

Here is a short explanation of the SQL Server statistics histogram. I am using AdventureWorks2008R2 database. There is a table Sales.SalesOrderHeader with a column TotalDue. Let us have a look at that column.

USE AdventureWorks2008R2
GO
 
-- first observe the rows
select TotalDue from Sales.SalesOrderHeader
GO
 
select DISTINCT TotalDue from Sales.SalesOrderHeader
GO

You will observe that out of some 31,000+ rows, there are some 4700+ distinct values – good data to demo statistics histogram.

I’ll go ahead and create a statistics object on this column.

CREATE STATISTICS st_TotalDue ON Sales.SalesOrderHeader(TotalDue)
GO</pre>
Now, let’s observe the statistics histogram.
<pre class="lang:default decode:true">DBCC SHOW_STATISTICS(N'Sales.SalesOrderHeader', st_TotalDue)
GO

The histogram output is as follows:

SQL Server Statistics Histogram

Let us understand this metadata.

As the name implies, Histogram divides those 31,000+ values in TotalDue column into various buckets. The RANGE_HI_KEY columns shows you the boundary values of those buckets. More precisely, each value there is the upper boundary value of a histogram step. The column RANGE_ROWS shows the number of rows where the TotalDue value is between two RANGE_HI_KEY values (mean between 2 boundaries). For example, there are 2 records between 2.5305 and 4.409; meaning there are 2 rows where the TotalDue column value is between 2.5305 and 4.409 (excluding the boundaries). Let us verify this.

select COUNT(*) from sales.SalesOrderHeader
where TotalDue&gt;2.5305 and TotalDue&lt;4.409

SQL Server Statistics Histogram

The EQ_ROWS shows the number of rows where TotalDue column value is equal to the RANGE_HI_KEY value. For example, there are 139 rows equal to 2.5305; meaning there are 139 rows where the TotalDue column value is equal to 2.5305 which is a boundary value. Let us verify:

select COUNT(*) from sales.SalesOrderHeader
where TotalDue=2.5305

SQL Server Statistics Histogram

The DISTINCT_RANGE_ROWS shows the number of distinct records between two RANGE_HI_KEY values; meaning the number of records where the TotalDue column value is unique between RANGE_HI_KEY values. For example, there is one distinct value between 2.5305 and 4.409. Let us verify:

select DISTINCT TotalDue from sales.SalesOrderHeader
where TotalDue&gt;2.5305 and TotalDue&lt;4.409

SQL Server Statistics Histogram

You can see that there is only one row output. Recollect that there are only 2 rows that falls between this range – remember RANGE_ROWS?

Which means, these 2 records should be identical considering TotalDue column value? Let us verify:

select TotalDue from sales.SalesOrderHeader
where TotalDue&gt;2.5305 and TotalDue&lt;4.409

SQL Server Statistics Histogram

Data Platform Virtual Summit 2020

Subscribe to SQLMaestros YouTube channel. If you want more learning content in your inbox, subscribe to SQLMaestros Bulletin.

SQLMaestros YouTube | SQLMaestros Bulletin | SQLMaestros Twitter