Multicore processors and Sybase ASE: Jeff Tallman

When T V S Murty asked on the sybase-l mailing list about Sybase ASE, multicores and Sybase licensing, the discussion quickly drilled down to whether or not multicores were beneficial to Sybase ASE and database software in general. Jeff Tallman, of Sybase fame, described in detail how Sybase ASE and multicore processors relate to each other.

From: Jeff Tallman <tallmanATsybaseZeDOTcom>
Subject: [sybase-l] – RE: Multicore processors and ASE

As always a lot depends on the application profile.   Something to consider for any multicore processor are factors:

  1. The number of FPU units per chip (FPU = Floating Point Unit)
  2. The number and capacity (in IOPS) of IO processors per chip
  3. The type of chip multi-threading

With respect to #1, most DBMS (at least the commercial ones) use statistics for query optimization – so while the actual query processing doesn’t use a lot of FPU instructions (assuming a minimum of float datatypes, etc.).  Each query requires a pretty good smack of the FPU time to do the floating point math on the stats.  The impact of this could be lessened by doing statement caching or fully prepared statements…or other means at reducing the optimizer load.

The second problem is one of capacity vs. bandwidth.   All network and disk IO obviously need to use the IO processor.   With 4 dual core chips, usually, you have 4 IO processors.

With a single chip with 8 cores, it is likely that you will have only a single IO processor.   The single IO processor has 8 cores all making requests.  The number of IO operations per second it can handle becomes a real key factor in the box’s scalability.

The chip multi-threading is an interesting issue as there are ~3 different flavors today:

  1. Intel’s Hyperthreading (no longer implemented on XEON and I don’t think implemented at all anymore)
  2. Sun’s Chip Multi-Threading (CMT)
  3. IBM’s SMT

Some instructions require multiple cycles to complete due to they are waiting on a fetch from main memory or whatever.   The thread/process of execution typically blocks in these cases, resulting in a fairly idle core.   By making use of this idle time, CMT or SMT can increase the throughput overall — ignoring HT as it was fairly ineffective at this – and appears to have been dropped by Intel lately.

The question that comes up is how do you manage the threading?  Do you do a form of timeslicing (i.e. when you suspend on process that is blocked on a call, do you let the one that replaced it run for a certain length of time or until it blocks before returning back to the original) or do you do an interrupt based/preemptive mechanism in which when the blocked call returns, that you suspend the other thread?   Both have advantages and disadvantages, and do allow more engines than cores.

However, it may also mean tuning ASE to be more reactive, such as reducing the ‘runnable process search count’.  You also need to be careful that engines running on CMT’s don’t get woken back up on another core (especially if the L2 cache is split between the cores) as well as other considerations.

A rule of thumb to think about is that if you have a multi-core CPU that supports chip threading, if you have a lengthy list of SPIDs in a ‘runnable’ state, enabling extra engines on the threads will likely help.   If you don’t – i.e. you are IO bound – that it probably won’t help.

Currently, Sun uses a timeslicing mechanism that is more along the lines of ASE’s SPID management – and as a consequence, it shows scalability when the various tasks do a lot of blocking calls such as fetches from main memory.  It does have the detrimental effect of only providing a percentage of cpu time to the ASE engine (i.e. 25% when 4 threads per core).   The more parallelism is used within your application, such as higher numbers of concurrent users in ASE, the more it can be distributed across the engines.

You have to be careful as net engine affinity and short query’s (i.e. DML).  They can have a negative impact, which may be controllable using engine groups.   Overall, a cpu-intensive/cpu bound application can benefit from the Sun CMT implementation.  An IO bound application does not.

Share Button