SAP Sybase IQ SA CR 728597 / Linux Kernel direct i/o bug & huge pages

Last year, April -> October, I asked the question about IQ supporting Huge Pages on Linux. It was mentioned that under SA CR 728597 and Red Hat Bug 891857 that there was a bug in the Linux kernel handling of direct I/O while using transparent huge memory pages (a variant of Linux Huge memory pages).

CR 728597:
This problem is related to a possible bug in the transparent huge pages (THP) feature introduced in these operating system versions. Red Hat bug 891857 has been created to track this issue.

The problem can be triggered by calling an external environment, xp_cmdshell, or other procedure that causes a fork while other I/O is occurring. A known limitation with the Linux kernel limits the use of fork while doing O_DIRECT I/O operations. Essentially what can happen is that the data can come from or go to the wrong process’ memory after the fork. SQL Anywhere performs O_DIRECT I/O operations according to the documented safe usage. However, THP appears to cause further problems and the O_DIRECT I/O data comprising database page reads/writes appears to get lost.

http://scn.sap.com/thread/3338917 and http://froebe.net/blog/2013/06/17/does-anyone-have-any-details-on-redhat-linux-bug-891857/

Does anyone know the status of this ongoing FIVE year old issue?

http://scn.sap.com/thread/3505418

Share Button

Does anyone have any details on RedHat Linux bug 891857?

Huge page support was disabled with SA CR 728597 due to RH bug.

CR 728597: This problem is related to a possible bug in the transparent huge pages (THP) feature introduced in these operating system versions. RedHat bug 891857 has been created to track this issue. There is a huge difference between a RedHat specific bug, a bug that affects only RedHat Enterprise Linux, and a Linux (Kernel?) bug that affects other distributions. Remember that SuSE Enterprise Linux is fully supported by SAP/Sybase.

I don’t have a support contract with RedHat, so I can’t see what bug 891857 actually is.

UPDATE:

With regard to Sybase IQ bug CR728597 and RedHat bug 891857 involving memory corruption due to Direct I/O and Transparent Huge Memory Pages, I was able to reach the RedHat engineer.  Andrea Arcangeli reported the issue to the Linux kernel mailing list back in 2009.  The issue was never resolved because of the complexity of the fix. https://patchwork.kernel.org/patch/11174 includes both the patch to the kernel branch of the time and the ‘repro’ C example.

Share Button

VMware 2.0.2 running on Win2k3 -> VMware ESX 4.0 .. done :)

I was able to move a VMware Server 2.0 (v7) vm to VMware ESX..  it was a *live* copy where I performed a Windows Volume Shadow copy of the vm files.  Everything worked for the most part but because the database, Sybase ASE 15.0.3, was running when the shadow copy was made, we had corruption in one database.  Restore from backup and all is good.

Now we need to get an updated license file from Sybase as the NIC mac address has changed..   You can *not* use the mac address from the VMware Server on ESX.  grr.

Twenty hours for the volume shadow copy to complete plus another 12 hours to scp the files to the esx box (esx console access is sloooow).   Keep in mind that the host VMware Server box was rebooting itself randomly so I really couldn’t leave it alone.  Then 3 hours to convert/clone the vmdk files and 2 hours to correct the database…  I’m tired.

It turned out to be an issue with allocating 3.75GB to a VM that was causing the rebooting.  Dropping it to 2 GB resolved the rebooting… who knew?  Nothing in Google and VMware Support wasn’t able to find anything on their side.

Share Button

HOWTO: Fixing a raw device misconfiguration

If you were observent in the Mapping Linux LVM and Raw partitions blog post, you probably noticed that there are two raw devices pointing to the same logical volume

raw -qa
/dev/raw/raw1:  bound to major 253, minor 7
/dev/raw/raw2:  bound to major 253, minor 8
/dev/raw/raw3:  bound to major 253, minor 9
/dev/raw/raw4:  bound to major 253, minor 10
/dev/raw/raw5:  bound to major 253, minor 12
/dev/raw/raw6:  bound to major 253, minor 13
/dev/raw/raw7:  bound to major 253, minor 15
/dev/raw/raw8:  bound to major 253, minor 15
/dev/raw/raw10: bound to major 253, minor 16
/dev/raw/raw11: bound to major 253, minor 17

Correcting this misconfiguration is easy but can be painful if the devices have been put to use (maybe as a database device).  Since we’ve already done the mapping (see Mapping Linux LVM and Raw partitions), we know the devices that they should be mapped to.

Let’s assume that no one has starting using either raw device and fix it the easy way (as root):

raw /dev/raw/raw7 /dev/dbvg/rawdatavol07
/dev/raw/raw7:  bound to major 253, minor 14
raw /dev/raw/raw8 /dev/dbvg/rawdatavol08
/dev/raw/raw8:  bound to major 253, minor 15

We have one more step, we need to update whatever script is run at start up to configure the raw devices to make sure that the mapping is retained after we reboot:

On RedHat and derived distributions, we modify the /etc/sysconfig/rawdevices:

/dev/raw/raw1 /dev/dbvg/rawdatavol01
/dev/raw/raw2 /dev/dbvg/rawdatavol02
/dev/raw/raw3 /dev/dbvg/rawdatavol03
/dev/raw/raw4 /dev/dbvg/rawdatavol04
/dev/raw/raw5 /dev/dbvg/rawdatavol05
/dev/raw/raw6 /dev/dbvg/rawdatavol06
/dev/raw/raw7 /dev/dbvg/rawdatavol07
/dev/raw/raw8 /dev/dbvg/rawdatavol08
/dev/raw/raw10 /dev/dbvg/rawdatavol10
/dev/raw/raw11 /dev/dbvg/rawdatavol11

RedHat provides the script file /etc/init.d/rawdevices that will read the /etc/sysconfig/rawdevices and while we could use it to correct the raw device mappings…. It is my understanding that remapping of the raw devices that are in use may allow for loss of data at the instant that the remapping takes place.  So, we avoid the whole situation and run the raw command on only the devices that are mismapped.

Share Button