In one of my previous notes, I had talked about the histogram, which is one of the most critical things in the stats object. Histogram, as the name suggests, is a bucketing technique of how the column data is distributed – the highest value of a bucket, which defines a boundary, how many rows are between two boundary values, how many rows are equal to a specific boundary value, how many unique values are there between a range and, how many rows are there on average per distinct value. All of this helps the optimizer make the right estimates. Here is a sample histogram on TotalDue column of SalesOrderHeader table (not putting down all the columns for brevity):
RANGE_HI_KEY RANGE_ROWS EQ_ROWS
============= ============ =========
26.2769 40 142
30.1444 17 202
.
.
(more rows)
Now, let’s we write a query:
SELECT * FROM Sales.SalesOrderHeader
WHERE TotalDue = 30.1444
The above query will return 202 rows. If check the cardinality estimation from the execution plan, you will observe the Estimated Number of Rows = 202 and the Actual Number of Rows=202. This is perfect and the best-case scenario for the optimizer where the estimate and the actual matched 100%.
How did this happen? Well, this is called Histogram Step Hit. The predicate value mentioned in the query (WHERE condition) has a step representation in the histogram and the optimizer does a perfect estimation with EQ_ROWS (the number of rows equal to the step value), which is 202.
The above is just a quick explanation of the ways how the optimizer leverages the histogram.
Sounds interesting? Watch the video below.
Get new content directly in your inbox – Click here
You can also visit DataPlatformGeeks (DPG) Video Library to watch more videos and to download video resources (presentation & demo code).
Hope the video was worth your time.
Happy Learning.
Leave A Comment