Simon Ogden recently wrote up an excellent post regarding Out of Range Histogram Adjustments. While he explains in detail how the out of range selectivity is computed, I think we can make it a little bit clearer. I hope Simon doesn’t mind! 🙂
The output of optdiag shows two unique values in the stored statistics (histogram) for the a table. :
Step Weight Value
1 0.00000000 < "C " 2 0.01000000 = "C " 3 0.00000000 < "N " 4 0.99000001 = "N "[/text] Remember the weight values for the unique values. Think of the weight values as percentage of the table not counting out of range values. So if we have the following query then we could expect 1% of the table to contain that value: [sql gutter="false"]select * from table a where a1 = "C"[/sql] Let's move on to out of range values. The general equation to determine the selectivity of a single out of range unique value is:
= the number of unique values in the histogram
= the number of unique values outside of the histogram
= selectivity of a single out of range unique value
Let’s assume we’ve added data to the table but haven’t run update statistics yet. The stored statistics won’t contain any metrics on the new data. So, ASE will estimate what it thinks the selectivity could be for the values we are searching on.
In the following query, we are searching for T and Y.
So how does ASE determine the selectivity for T and Y? We can use the selectivity query (2) replacing and to compute :
So, the selectivity for T is 0.1666667 and Y is 0.1666667. Since the sum of the histogram and the out of histogram selectivity (weight) must equal 1, we need to adjust the histogram weight values.
As you can see, we have 0.6666667 left over. If you remember from the histogram, the value C composes 1% of the table and the value N 99% of the table. All we have to do is plug those in:
So we get the results that we’re looking for: