Category: IBM DB2


When T V S Murty asked on the sybase-l mailing list about Sybase ASE, multicores and Sybase licensing, the discussion quickly drilled down to whether or not multicores were beneficial to Sybase ASE and database software in general. Jeff Tallman, of Sybase fame, described in detail how Sybase ASE and multicore processors relate to each other.

From: Jeff Tallman <tallmanATsybaseZeDOTcom>
To: sybase-l@lists.isug.com
Subject: [sybase-l] – RE: Multicore processors and ASE

As always a lot depends on the application profile.   Something to consider for any multicore processor are factors:

  1. The number of FPU units per chip (FPU = Floating Point Unit)
  2. The number and capacity (in IOPS) of IO processors per chip
  3. The type of chip multi-threading

With respect to #1, most DBMS (at least the commercial ones) use statistics for query optimization – so while the actual query processing doesn’t use a lot of FPU instructions (assuming a minimum of float datatypes, etc.).  Each query requires a pretty good smack of the FPU time to do the floating point math on the stats.  The impact of this could be lessened by doing statement caching or fully prepared statements…or other means at reducing the optimizer load.

The second problem is one of capacity vs. bandwidth.   All network and disk IO obviously need to use the IO processor.   With 4 dual core chips, usually, you have 4 IO processors.

With a single chip with 8 cores, it is likely that you will have only a single IO processor.   The single IO processor has 8 cores all making requests.  The number of IO operations per second it can handle becomes a real key factor in the box’s scalability.

The chip multi-threading is an interesting issue as there are ~3 different flavors today:

  1. Intel’s Hyperthreading (no longer implemented on XEON and I don’t think implemented at all anymore)
  2. Sun’s Chip Multi-Threading (CMT)
  3. IBM’s SMT

Some instructions require multiple cycles to complete due to they are waiting on a fetch from main memory or whatever.   The thread/process of execution typically blocks in these cases, resulting in a fairly idle core.   By making use of this idle time, CMT or SMT can increase the throughput overall — ignoring HT as it was fairly ineffective at this – and appears to have been dropped by Intel lately.

The question that comes up is how do you manage the threading?  Do you do a form of timeslicing (i.e. when you suspend on process that is blocked on a call, do you let the one that replaced it run for a certain length of time or until it blocks before returning back to the original) or do you do an interrupt based/preemptive mechanism in which when the blocked call returns, that you suspend the other thread?   Both have advantages and disadvantages, and do allow more engines than cores.

However, it may also mean tuning ASE to be more reactive, such as reducing the ‘runnable process search count’.  You also need to be careful that engines running on CMT’s don’t get woken back up on another core (especially if the L2 cache is split between the cores) as well as other considerations.

A rule of thumb to think about is that if you have a multi-core CPU that supports chip threading, if you have a lengthy list of SPIDs in a ‘runnable’ state, enabling extra engines on the threads will likely help.   If you don’t – i.e. you are IO bound – that it probably won’t help.

Currently, Sun uses a timeslicing mechanism that is more along the lines of ASE’s SPID management – and as a consequence, it shows scalability when the various tasks do a lot of blocking calls such as fetches from main memory.  It does have the detrimental effect of only providing a percentage of cpu time to the ASE engine (i.e. 25% when 4 threads per core).   The more parallelism is used within your application, such as higher numbers of concurrent users in ASE, the more it can be distributed across the engines.

You have to be careful as net engine affinity and short query’s (i.e. DML).  They can have a negative impact, which may be controllable using engine groups.   Overall, a cpu-intensive/cpu bound application can benefit from the Sun CMT implementation.  An IO bound application does not.

As many of you know, I’ve been working on a free magazine regarding various database systems (dbms) called My Databases.  I hope to have multiple authors in future issues covering all sorts of open source and proprietary databases.

I should have the first issue done Sunday night.  I’m using OpenOffice, Scribus, Gimp, and Inkscape.

ASE implements a subset of SQL 92 and isn’t 100% compliant with the SQL 92 standard (no DBMS on the planet is btw).

SQL99 compliance isn’t seriously being looked at by the major commercial DBMS vendors. Disregarding the fact the the SQL standards aren’t all they are cracked up to be, the vendors have too much invested in their own proprietary SQL variants (and other components) to be 100% compliant. If they were 100% compliant with the SQL92/99/whatever standard, then wholesale migrations from one vendor to another would take place.

I believe as time goes forward the opensource DBMSs (PostgreSQL, MySQL, etc) may become far more compliant with the standards than the commercial vendors as vendor lock in doesn’t mean as much to them.

Look into what the vendors (Oracle, MS, IBM,etc) are saying what constitutes as “compliance”. Ask each vendor what parts of the SQL99 standard they will be implementing and which parts they won’t be. If any vendor says that they are 100% compliant with the SQL92 or SQL99 ANSI standard, then that particular person is lying to you. Granted, that person may have been told their DBMS was 100% compliant and believes it. An honest vendor says that they comply with features X,Y and Z of the SQL 92 or SQL 99 standards.

Personally, I have found no significant movement by any of the commercial DBMS vendors to implement the SQL99 standard. So far it has just been lip service IMHO.

In Chris Brown‘s Virtualization and ASE blog post, he brings up the question of whether Sybase’s ASE can be used in a virtual environment (VMWare, Xen, etc) but doesn’t answer it. I’ve been using various databases in virtual environments for several years, here is what I found out:

Running ASE, ASIQ, or SQL Anywhere under virtualization software such as Xen, VMWare, Parallels, etc is very useful under a number of situations:

  1. development of new applications – each developer group can have its own “db server” on the same machine
  2. testing new ebfs/releases with your applications
  3. reproducing problems either in the sybase software or in the application code – a ‘virgin’ instance that can be duplicated at will
  4. trying out new operating systems (moving from Windows to Linux or Windows to Solaris x86?) without investing in new hardware

The main caveat is that the performance stinks – databases typically require high disk i/o, memory i/o and cpu responsiveness. The virtualization software currently available, even with hardware help (newer Intel,AMD chips), are not up to the task of running a *production* database.  In a couple years… possibly.

Most database monitoring systems aren’t from the database vendors as you might think, but a hodge-podge of 3rd party vendors that seem to want to charge more than I make in a lifetime for database monitoring software — try finding low cost monitoring software for DB2 on the mainframe.

They typically use standardized, and often deprecated, monitor counters that when used for their product, interfere with any other monitoring products you might use.  For example, if the Operations Department is using Nimbus to monitor the network, VoIP, hosts, tape archival systems, and the database servers to ensure that they are running, what happens when the DBAs want to use DBA Expert? The two products (keep in mind that I chose the products for the example at random) will trip over each other – neither will provide reliable metrics of the databases.

The front ends for the monitoring products always seem to show a fancy GUI full of bright colors, dials, graphs, and the latest and greatest designer kitchen sink. They are very rarely willing to provide any documented API or mechanism for you to obtain the data from their product without a nasty NDA. The premise is that you will use their front end to display and analyze the monitoring metrics.

The database vendors, themselves, are largely to blame. The monitoring APIs that they offer assume that you will only be using a single monitoring system.  For example, in Sybase’s ASE, the new API is to use their MDA tables to obtain performance metrics but the problem comes in when the monitoring software would use multiple methods to obtain additional information that may not be (easily) obtainable from the MDA tables.  sp_sysmon will reset several monitoring counters unless you call it with the ‘noclear’ option.  Unfortunately, the ‘noclear’ is not widely known and rarely used in the monitoring software.  Of course, this is just an example of multiple monitoring APIs from a database vendor.

You know what? I don’t care about the vendors’ fancy front ends. Give me a web service that I can access and use the monitoring metrics in another application, a PDA, etc. A few vendors have tried to offer an API but they are often so damned complicated that you would have had to work at the company to understand the API.  Don’t even get me started on vendors keeping their APIs updated.

Update:  Thanks goes to Peter Dorfman of Sybase to helping clarifying that the MDA tables in ASE ‘clear’ only on a connection basis.  That means if you look at monDeadlocks on connection #1 twice, the first select might show 5 rows and the second 0 rows.  If you ran the select on connection #2 sometime later, you would see the 5 rows plus any other deadlocks that might have occurred since then.  I wasn’t very clear on that as I was (in my head) also including sp_sysmon and other monitoring options that would conflict with the MDA tables.