Home » Databases » Sybase » ASE » Commentary on Simon Ogden’s Out of range histogram adjustments

Commentary on Simon Ogden’s Out of range histogram adjustments

Simon Ogden recently wrote up an excellent post regarding Out of Range Histogram Adjustments. While he explains in detail how the out of range selectivity is computed, I think we can make it a little bit clearer. I hope Simon doesn’t mind! 🙂

The output of optdiag shows two unique values in the stored statistics (histogram) for the a table. :

     Step     Weight                    Value
        1     0.00000000        < "C    "
        2     0.01000000        =       "C    "
        3     0.00000000        <       "N    "
        4     0.99000001        =       "N    "&#91;/text&#93;

Remember the weight values for the unique values.  Think of the weight values as percentage of the table not counting out of range values.  So if we have the following query then we could expect 1% of the table to contain that value:
&#91;sql gutter="false"&#93;select * from table a where a1 = "C"&#91;/sql&#93;

Let's move on to out of range values.  The general equation to determine the selectivity of a single out of range unique value is:
<a name="id-1924473240"></a><p class="ql-center-displayed-equation" style="line-height: 86px;"><span class="ql-right-eqno"> (1) </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-c6fb272a20ff13eb6207be5f54cccdbc_l3.png" height="86" width="110" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;&#32; &#115;&#32;&#61;&#32;&#92;&#99;&#102;&#114;&#97;&#99;&#123;&#92;&#99;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#117;&#32;&#43;&#32;&#110;&#125;&#125; &#32;&#32;&#32;&#32;&#32;&#123;&#49;&#32;&#43;&#32;&#92;&#99;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#117;&#32;&#43;&#32;&#110;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
We can simplify that using a bit of algebra:
<a name="id-542211357"></a><p class="ql-center-displayed-equation" style="line-height: 38px;"><span class="ql-right-eqno"> (2) </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-719b2542fba5b6cabcbbc554d157ec42_l3.png" height="38" width="86" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;&#32; &#115;&#32;&#61;&#32;&#92;&#99;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#117;&#32;&#43;&#32;&#50;&#110;&#125; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-43fe27dc3e528266a619764d90fce60b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#117;" title="Rendered by QuickLaTeX.com" height="8" width="10" style="vertical-align: 0px;"/> = the number of unique values in the histogram
<img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#110;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> = the number of unique values outside of the histogram
<img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#115;" title="Rendered by QuickLaTeX.com" height="8" width="8" style="vertical-align: 0px;"/> = selectivity of a single out of range unique value

Let's assume we've added data to the table but haven't run update statistics yet.  The stored statistics won't contain any metrics on the new data.  So, ASE will estimate what it thinks the selectivity could be for the values we are searching on.

In the following query, we are searching for T and Y.  
[sql highlight_lines="4"]select * 
from a 
where 
a1 in ('T', 'Y')[/sql]
So how does ASE determine the selectivity for T and Y?  We can use the selectivity query (<a href="#id-542211357">2</a>) replacing <img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-43fe27dc3e528266a619764d90fce60b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#117;" title="Rendered by QuickLaTeX.com" height="8" width="10" style="vertical-align: 0px;"/> and <img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-b170995d512c659d8668b4e42e1fef6b_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#110;" title="Rendered by QuickLaTeX.com" height="8" width="11" style="vertical-align: 0px;"/> to compute <img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#115;" title="Rendered by QuickLaTeX.com" height="8" width="8" style="vertical-align: 0px;"/>:
<p class="ql-center-displayed-equation" style="line-height: 38px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-31a1efd0a230fd9a0c78432b89cb17f7_l3.png" height="38" width="172" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#48;&#46;&#49;&#54;&#54;&#54;&#54;&#54;&#55;&#32;&#61;&#32;&#92;&#99;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#32;&#43;&#32;&#50;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#50;&#125;&#32;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>
So, the selectivity for T is 0.1666667 and Y is 0.1666667. Since the sum of the histogram and the out of histogram selectivity (weight) must equal 1, we need to adjust the histogram weight values.  

We have two unique values in the histogram: C and N.  The revised histogram weight (<img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-14b463d0ecd5b350ced6cf1d6a12eef3_l3.png" class="ql-img-inline-formula quicklatex-auto-format" alt="&#104;" title="Rendered by QuickLaTeX.com" height="13" width="10" style="vertical-align: 0px;"/>) is the percentage of what is left for in histogram unique values:
<a name="id1772733528"></a><p class="ql-center-displayed-equation" style="line-height: 14px;"><span class="ql-right-eqno"> (3) </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-b9122c55774bf0b28ae511e70a450785_l3.png" height="14" width="84" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;&#32; &#104;&#32;&#61;&#32;&#49;&#45;&#110;&#115;&#32;&#92;&#093; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p class="ql-center-displayed-equation" style="line-height: 14px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-a87170c1dbc29d7b642af4eb84ab6139_l3.png" height="14" width="237" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#32;&#48;&#46;&#54;&#54;&#54;&#54;&#54;&#54;&#55;&#32;&#61;&#32;&#49;&#32;&#45;&#32;&#50;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#48;&#46;&#49;&#54;&#54;&#54;&#54;&#54;&#55;&#32;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>

As you can see, we have 0.6666667 left over.  If you remember from the histogram, the value C composes 1% of the table and the value N 99% of the table.  All we have to do is plug those in:
<p class="ql-center-displayed-equation" style="line-height: 14px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-8a7f130f771ba44f6f3888205eeb2f74_l3.png" height="14" width="229" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#32;&#48;&#46;&#48;&#48;&#54;&#54;&#54;&#54;&#55;&#32;&#61;&#32;&#48;&#46;&#48;&#49;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#48;&#46;&#54;&#54;&#54;&#54;&#54;&#54;&#55;&#32;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>
<p class="ql-center-displayed-equation" style="line-height: 13px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://froebe.net/blog/wp-content/ql-cache/quicklatex.com-07540982ebee462825fbe904b0eed5a6_l3.png" height="13" width="229" class="ql-img-displayed-equation quicklatex-auto-format" alt="&#92;&#091;&#32;&#48;&#46;&#54;&#54;&#48;&#48;&#48;&#48;&#48;&#32;&#61;&#32;&#48;&#46;&#57;&#57;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#48;&#46;&#54;&#54;&#54;&#54;&#54;&#54;&#55;&#32;&#92;&#093;" title="Rendered by QuickLaTeX.com"/></p>

So we get the results that we're looking for:
[text]'C' 0.0066667
'N' 0.6600000
'T' 0.1666667
'Y' 0.1666667
Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

Facebook login by WP-FB-AutoConnect