SAP Sybase IQ Data Warehouse Index Advisor and awk

If you are running SAP’s IQ Data Warehouse SAP_IQsomewhere in your organization, there will be a time when you need to enable the Index Advisor (see SAP Sybase IQ: Index Advisor for a user) and parse the output. One problem is that the output will go to the IQ message log (IQ.iqmsg) which is already polluted with other information. So how can you easily pull out the recommendations and still keep your sanity?

A little AWK will do the majority of the work for you:

 awk 'BEGIN {COLSTART=6;COLEND=17}/Add |Rebuild /{ for (i=COLSTART;i<=COLEND;i++) printf("%s%s",$(i), i<COLEND ? OFS :"\n"); }' IQ.iqmsg | sort | uniq -c | sort -n |tail -10

produces the following “top 10” index advisories:

 8 Add a unique HG index to join key column SCHEMA_B.STUFF_SKILL_SYSTEM.STUFF_SKILL_ID
 8 Add unique LF index on SCHEMA_A.STUFF_SKILL_SYSTEM.STUFF_SKILL_ID
 9 Add a unique HG index to join key column SCHEMA_A.STUFF_SYSTEM.STUFF_ID
12 Add a HG index to join key column SCHEMA_B.STUFF_SYSTEM.SOURCE_STUFF_ID
12 Add a unique HG index to join key column SCHEMA_B.STUFF_SKILL_ID_MAP.TYPE_ID
12 Condition 1 Index Advisor: Add LF index on SCHEMA_B.STORAGE_SYSTEM.SYSTEM_ID
22 Condition 2 Index Advisor: Add LF index on SCHEMA_B.STORAGE_SYSTEM.STATUS_C
32 Add LF index on SCHEMA_A.MOON_TREE.MOON_TREE_CONTEXT_ID
34 Add unique LF index on SCHEMA_A.STUFF_OBTUSE_CALLER_MAP.STUFF_CALLER_OBTUSE_ID
34 Add unique LF index on SCHEMA_A.STUFF_OBTUSE_MAP.STUFF_OBTUSE_ID

That AWK code looks pretty ugly doesn’t it? Well, it is simpler than you think.

BEGIN {
    COLSTART=6;
    COLEND=17
}
/Add |Rebuild /{
    for (i=COLSTART; i< =COLEND; i++) 
        printf("%s%s",$(i), i<COLEND ? OFS :"\n"); 
}

The BEGIN block runs before any data is parsed. We need to print columns 6 to 17 for each line that matches. We set the COLSTART (starting column) to 6 and COLEND (ending column) to be 17.

/Add |Rebuild /

Simply means if the line contains “Add ” or “Rebuild ” then do whatever is in the code block (between the curly braces).

for (i=COLSTART; i< =COLEND; i++)

For each number between 6 and 17 execute the following line.

printf("%s%s",$(i), i<colend ? OFS :"\n");

Print the column and the field separator. If we’re looking at column 17, then we print the column and a new line.

Next we sort the output of awk: sort

Now that we have sorted output, we count any duplicate index recommendations, print the # of duplicates and the unique line

Let’s sort on the number of duplicate index recommendations with sort -n

In this case, we only want the top 10 index recommendations: tail -10

This is based on the recommendation from SAP consultant Steve Bologna:

egrep -i "Add " IQ.iqmsg | awk '{ print $6 " " $7 " " $8 " " $9 " " $10 " " $11 " " $12 " " $13 " " $14 " " $15 " " $16 " " $17 }' | sort | uniq -c | sort -n
Share Button

FW: A Faster Perl Runtime in Tiny Steps by Steffen Müller

Booking.com perluses the Perl programming language heavily across its entire technical infrastructure. At the size of our infrastructure, even small performance improvements can translate into hefty savings, not to mention the exciting new features that we will implement with those spare CPU cycles. For this reason, we are very happy to announce that we are now funding Dave Mitchell, prominently known for his many years of high-quality contributions to the Perl language implementation, to improve the Perl runtime’s performance one small step at a time.

Read more at A Faster Perl Runtime in Tiny Steps by Steffen Müller

Share Button

How to enable disabled (by default) performance features in SAP Sybase ASE 15.x

Starting in Adaptive SAP SybaseServer version 15.0.3 ESD #2, performance related optimizer changes are not enabled by default. Users will need to test the enhancements using specific set options. In addition, functionality has been added to switch on/off previous optimizer changes and features. This white paper discusses how and when to enable the changes.

Because the optimizer changes are not enabled by default, applications already running efficiently are not affected by, nor will the necessarily benefit from, these changes when upgrading to the latest ESD.

In 15.0.3 ESD #3 and above, we introduced support for global Optlevels. Enabling these changes increases query performance for many applications, though Sybase recommends additional performance testing.

In 15.5 ESD 3 and above, we introduced 15.5 global Optlevels that match the 15.5 ESD level. Prior to ESD 3, 15.0.3 ESD versions were mapped to 15.5.

Example:
set plan optlevel ase_current

Table 1-1: Available optlevels

 

 

Value Defined
ase_current enable all optimizer changes up through the current ESD
ase_default disable all optimizer changes since 1503 esd #1
ase1503esd2 enable all optimizer changes up through 15.0.3 ESD #2
ase1503esd3 enable all optimizer changes up through 15.0.3 ESD #3
ase1503esd4 enable all optimizer changes up through 15.0.3 ESD #4
ase155esd3 enable all optimizer changes up through 15.5 ESD #3
ase155esd4 enable all optimizer changes up through 15.5 ESD #4
ase_current on ASE 15.7 GA enable all optimizer changes up through 15.5 ESD #4
ase155esd5 enable all optimizer changes up through 15.5 ESD #5
ase157esd1 enable all optimizer changes up through 15.7 ESD #1
ase_current on ASE 15.7 ESD#2 enable all optimizer changes up through 15.7 ESD #1

Read the complete White Paper (Technote 1080354) at SAP

Share Button

HowTo: Comparing two tables and returning the # of rows that match FAST

The most intuitive way to write a comparison of two tables and spit out the # of rows that are the same involves a left join:

select 'Investment' as TableName, count(*) as RowCount
from Investment_A a, Investment_B b
where
  a.col1 = b.col1
  AND a.col2 = b.col2
  AND a.col3 = b.col3
  AND a.col4 = b.col4

This returns the correct answer but is very slow. Is there a better way? Of course!

select 'Investment' as TableName, count(*) as RowCount
from
(
   select 1 as num
   FROM (
      select * from Investment_A
      UNION ALL
      select * from Investment_B
   ) tmp
   GROUP BY col1, col2, col3, col4
   HAVING COUNT(*) > 1
) tmp2

By pushing the comparison off into the GROUP BY, we leverage the DBMS engine far more efficiently. There are two drawbacks:

  1. Readability of the SQL code
  2. Far more temporary storage is used for the GROUP BY. There is a real risk of running out of temporary storage if the tables are large.
Share Button

Optimizing the Firefox (SQLite databases)

Most people that want to speed up Firefox know about the various settings in about:config but many forget that Firefox uses SQLite databases that can become cluttered.

In your home directory ($HOME on *nix/Mac and %APPDIR% on Windows), run sqlite3 with vacuum, reindex and analyze on eche of the *.sqlite databases. Don’t have sqlite3? No problem, get it from the SQLite Download page or your distribution’s package manager (e.g. apt-get install sqlite3)

#!/bin/bash

cd ${HOME}/.mozilla/firefox/

for profile in *.default; do
cd ${profile}

for db in *.sqlite; do 
echo Processing $db
sqlite3 "${db}" < <EOF
.echo on
vacuum;
reindex;
analyze;
.exit
EOF
done

cd - # return to previous directory
done

That wasn’t hard was it? 🙂

Share Button

OpenVPN really really slow? This performance tip might help

I use the virtual private network software, known as OpenVPN, to connect from my laptops to my home every day.  There are several things I’ve noticed:

  • Most offices and many coffee shops will block the default port 1194 (UDP).  It is also a very popular port for naughty people trying to see what you have on your network.  If you’re not running a web server, set it to port 80 or 443 (TCP) as these ports are normally accessible.  If these don’t work, try other ones like 21 (TCP) which is normally used for a FTP server.  You will likely see better throughput on some ports than on others due to ‘traffic shaping’, aka giving network priority to certain applications.
    • Comcast blocks ports 21,80,443 for UDP and but not for TCP
  • The network packets that are sent through the vpn tunnel can become fragmented, split into two or more packets to make them fit into the vpn network packet.  Let’s increase the size of the vpn network packet to reduce the network packet fragmentation
    • tun-mtu 1500
    • mssfix 1400
  • Compression.  This is a little more subjective than you would think.  If most of your activity is based on data streams (e.g. watching video, listening to music), then the compression may cause delays (think extra buffering / stuttering).   My advice is to try with it on and try with it off..  which seems to be more responsive to you?
Share Button

Sybase ASE 15.0.2 MDA Poster!

Thanks goes to Jeff Tallman for creating the Sybase ASE 15.0.2 MDA poster. Don’t forget to check out the legend for the poster as well!

Share Button

Sybase ASE: Helping indexes work better

Drew Montgomery wrote up an excellent description of how to use indexes more wisely:
There are times when it seems that the optimizer just doesn’t “get it”. In other words, it doesn’t use what seems like an obvious choice for an index. There are several possible reasons for that, and we are going to explore a couple of them here…

The first could be that the index in question is using old statistics. The statistics for an index are used when evaluating queries to help the optimizer determine the cost of using a particular index. Each index for a table is evaluated by the optimizer along with some other complex interactions to determine if an index (or set of indices if we are dealing with multiple tables) should be used. If the statistics for an index are out of date, old, tired, worn out, they may no longer accurately represent the distribution of the data. The optimizer, using the older statistics, may determine that another course of action is best. This is why we would want to periodically update the statistics. Of course, this isn’t a cure-all for query woes. Far from it! But it can be a cause of some optimizer confusion.

Another reason could be that the query itself doesn’t let us use an index. You may be asking yourself “what could I have done to make the optimizer mad at me?” It’s not personal. It may just be the way you have rendered your query. One little known fact is that if you have an aggregate function against a column, the optimizer can’t use that column from an index. In other words, if you have a query that looks like this:

select * from local_users
where upper(name) like "%SMITH%"

You have prevented the optimizer from using an index based on name. The act of upper casing the column prevents the optimizer from using it. The same would be true if you had used an arithmetic function or even an equation:

select * from accounts
where account_limit * 10 < expected_account_limit[/sql] In the above case, the problem will be that the optimizer could use expected_account_limit OR account_limit but in no case could it use both. You would have to reform the query to move the *10 to the other side of the equation to allow the optimizer to use the account_limit column. In the same vein, if there is a data type mismatch between the columns being used in a query, the optimizer will have to make the column types "match" before it can complete the query. This means that the optimizer is applying an "implied" aggregate function (convert) to align the data types. If you have the following query: [sql]select p.name, p.address, j.title from personnel p, jobs j where p.j_id = j.j_id[/sql] and the column j_id in personnel allows nulls (for those people who are working under cover!), and j_id in jobs does not allow nulls since that is the primary key, the optimizer will not use an index based on the j_id column in the jobs table. In order to "match" the data types, it has to convert the j_id column in the jobs table from "int not null" to "int null" in order to match the definition of the j_id column from the personnel table. Sometime the difference in the columns could be a subtle as a varchar(10) vs. a char(10) or a varchar(20) vs. a varchar(21). In each case, the optimizer will make a conversion to the broadest common denominator to make the match, and in many cases that conversion will prevent the optimizer from using a given index. The last possible that I will discuss is the case where the wrong index exists, or a better index could be created. There are two distinct cases for this one. The first is where an index is given an advantage of some kind. That index could be in the form of the index being unique, or if the index is clustered. That advantage could be enough to sway the optimizer to use a specific index. Another case could be that even though the query is using an index, another index may be more efficient. I recently ran across this specific case. A query was using four different columns in its where clause, and two of those columns in an order by clause. The dynamic of the query was trying to retrieve the data on the basis of the date of the record, in descending order (I.E. they wanted the "youngest" record first) and in fact they had a restriction in the select to get the "top 1" record based on these criteria. A facsimile of the query is as follows: [sql]select top 1 name, balance_date, amount from bank_info where name = @name and account_type = @type and branch_id = @branch and balance_date < @eval_date order by name, balance_date desc[/sql] The query was using an index that had two of the four columns involved. Although this was better than a table scan (table had 82 million rows), it was not hugely efficient. The index did reduce the data set being sifted through, but was still taking 30,000 I/O to complete. Part of the problem was that once the basic query had been satisfied, the result was being put into a sort table that was then interrogated for the youngest record. This is a case for a new index. But the question is: "what does the index go on?" We could build an index against the an id column and the date. That would help the sort, but would not help the where clause a lot. We could build an index on the four columns in the where clause, but the result would still be that the data found would go into a sort table. Probably more efficient, but not as good as it gets. For the particular query, three of the four columns are being matched exactly, with the fourth being the date column's limitation as to what the cut off date is. If the cutoff date is midnight of March 11th, we are looking for the latest record from March 10th and earlier. The best solution is to create an index against the four columns in the were clause, with the date column last and (in the definition of the index) descending. That way when we restrict the other three columns, we start with the newest record. The other half of this is perhaps a little less obvious: use all of the columns of the new index in the order by clause. The change would be as follows: [sql]select top 1 name, balance_date, amount from bank_info where name = @name and account_type = @type and branch_id = @branch and balance_date < @eval_date order by name, account_type, branch_id, balance_date desc[/sql] That way the optimizer can use the index for the where clause and the order by clause simultaneously. The bottom line is that we go from thousands of I/Os to hundreds of I/O's (with the old index and old order by) finally to a much more reasonable 10-15 I/O. The difference in performance is dramatic. Bottom line, be aware of not only what indices exist, but how they are being used. Make sure to take care of them, and they will be like a good motor oil - they will improve the performance of your system while extending the life of the application.

Share Button