In one of my previous notes, I had talked about the histogram, which is one of the most critical things in the stats object. Histogram, as the name suggests, is a bucketing technique of how the column data is distributed – the highest value of a bucket, which defines a boundary, how many rows are between two boundary values, how many rows are equal to a specific boundary value, how many unique values are there between a range and, how many rows are there on average per distinct value. All of this helps the optimizer make the right estimates. Here is a sample histogram on TotalDue column of SalesOrderHeader table (not putting down all the columns for brevity):

RANGE_HI_KEY RANGE_ROWS EQ_ROWS
============= ============ =========
26.2769 40 142
30.1444 17 202
.
.
(more rows)
Now, let’s we write a query:

SELECT * FROM Sales.SalesOrderHeader
WHERE TotalDue = 30.1444

The above query will return 202 rows. If check the cardinality estimation from the execution plan, you will observe the Estimated Number of Rows = 202 and the Actual Number of Rows=202. This is perfect and the best-case scenario for the optimizer where the estimate and the actual matched 100%.

How did this happen? Well, this is called Histogram Step Hit. The predicate value mentioned in the query (WHERE condition) has a step representation in the histogram and the optimizer does a perfect estimation with EQ_ROWS (the number of rows equal to the step value), which is 202.

The above is just a quick explanation of the ways how the optimizer leverages the histogram.

Sounds interesting? Watch the video below.

Get new content directly in your inbox – Click here

You can also visit DataPlatformGeeks (DPG) Video Library to watch more videos and to download video resources (presentation & demo code).

Hope the video was worth your time.

Happy Learning.

Learn more about SQLMaestros Products & Services: Video Courses| Hands-On-Labs | Learning Kits | Health Check Service