Eloquence B.08.00 ================= Revision: beta10, 2008-06-06 Thank you for your interest in the Eloquence B.08.00 beta10 release. This Eloquence beta release provides a development snapshot of the upcoming Eloquence B.08.00 version that has passed some limited QA process. By making the test versions available publicly we hope to encourage wider testing and additional feedback. Please contact support@marxmeier.com to share your feedback or report a problem. Please note: This release is available under the terms of the Eloquence Beta Test Agreement which is specified in the file AGREEMENT. http://www.marxmeier.com/eloquence/download/beta/B0800/AGREEMENT Downloading and installing the software indicates your agreement to the Beta Test terms and conditions. This beta release does not meet the release criteria for quality or performance and is only intended for test usage. If it breaks you get to keep the pieces. Introduction ------------ This beta version currently includes a preliminary version of the Eloquence B.08.00 database server, client libraries and utilities. The major Eloquence B.08.00 database goals include: - scalability improvements - make use of available CPU resources (improved multi-threading) - improved scalability on large memory configurations - replication - release 64 bit version The beta current version is expected to be functionally complete. Subsequent releases will mostly focus on fixing problems and enabling performance enhancements. Requirements ------------ To use replication the following requirements must be met: - Currently, only the HP-UX and Linux operating systems are supported. - HP-UX 11.11 (or newer) on PA-RISC 2.0 and HP-UX 11.23 (or newer) on IA64 based systems - Linux glibc2.3 (or newer) on x86 and x86_64 based systems supporting the NPTL based threading - Installation of the most recent B.07.10 patches is required for close compatibility with B.07.10, if you intend to use a database environment with either 7.10 or 8.0 (as a fallback, for example). - A B.08.00 license key is required to use this beta release. Please contact support@marxmeier.com. - To use replication, a separate replication license key is required. Please contact support@marxmeier.com. Installation ------------ The Eloquence B.08.00 beta releases are available for download from the following location: HTTP protocol: http://eloquence.marxmeier.com/download/beta/B0800/ FTP protocol: ftp://ftp.marxmeier.com/eloq/beta/B0800/ Different versions of the Eloquence software are available. Please choose the appropriate version which corresponds with your hardware. Additional details on installation are provided in the INSTALL document. B.07.10 compatibility --------------------- Eloquence B.08.00 is upwards compatible with previous Eloquence versions and binary compatible with recent B.07.10 patch levels, if the server was shutdown cleanly. To revert to B.07.10 the following procedure is required: 1. If the 8.0 database server was not shutdown cleanly, for example due to a server crash, you need to first run the 8.0 version of dblogreset to re-apply (roll forward) any recently committed transactions. 2. Restart the server with the Eloquence 7.10 version To ensure a successful reversion to B.07.10 please make sure that the recent B.07.10 patches are installed. These are currently (as of 14. May 2008): PE71-0802120 - database server PE71-0801241 - dblogreset utility PE71-0803071 - dbrecover utility PE71-0803041 - fwaudit utility PE71-0802121 - dbfsck utility The following B.07.10 patches (or superseding) are suggested to be installed: PE71-0801250 - database client library PE71-0801251 - image3k library PE71-0802150 - eloqcore To use replication, the following B.07.10 patches (or superseding) must be installed: PE71-0705303 - dbrepl utility PE71-0705304 - chklic utility PE71-0705305 - dbvolextend utility PE71-0705306 - dbvolchange utility PE71-0804170 - dbcfix utility Download location, installation instructions and release notes: http://eloquence.marxmeier.com/support/B0710/patch/B0710.html The database client and image3k libraries are currently located in the B.07.10 beta patch directory: http://eloquence.marxmeier.com/download/B0710/patch/beta/ HP-UX Kernel Parameters ----------------------- Eloquence B.08.00 may require the configuration of additional kernel parameters: max_thread_proc - defines the maximum number of concurrent threads allowed per process. max_thread_proc limits the maximum number of threads allowed per process on the system. max_thread_proc needs to be set to at least the highest number of configured database connections serviced by any B.08.00 database server (threads= config item) plus 10. nkthread - number of threads allowed to run simultaneously The nkthread tunable controls the absolute number of threads allowed on a system at any given time. Increasing it will allow more threads, and lowering it will restrict the number of threads. It can be determined that nkthread is too low when the kthread: table is full message is seen in the message buffer. The message can be read via dmesg or syslog. This message indicates that an application was unable to create a thread. nkthread needs to be set to at least the sum of all configured database connections serviced by an B.08.00 database server (threads= config item, system wide) plus 10 per instance. nkthread must be greater than max_thread_proc. Documentation ------------- At this point no separate B.08.00 documentation is available. However, as this beta release is expected to be mostly functional equivalent to the most recent B.07.10 patches (PE71-0803070 and related) the corresponding B.07.10 documentation applies as well. Database replication This document describes the Eloquence replication functions. http://eloquence.marxmeier.com/support/B0710/doc/repl/index.html Database Server statistics This document describes the Eloquence Database Server statistics. http://eloquence.marxmeier.com/support/B0710/doc/stats/index.html Summary of enhancements (relative to the initial B.07.10 release) ----------------------------------------------------------------- * All B.07.10 patches (as applicable) are merged to B.08.00 * Add support for case insensitive indexes * Add the option to replicate database transactions to other database environments. * The DBLOCK-COMPAT database property may be used to modify the locking policy for a database. This may be used to specify an IMAGE compatible lock behavior for a database. * A forward-log is now retained across a crash of the database server. * After a DBFIND on an index, a DBGET mode 4 could not be used to position the current record inside the result. * A DBFIND resets the current record number for a data set. * The server was enhanced to support additional options for logging server or individual session performance information. * The server was enhanced to support additional dbctl commands to allow dynamically changing the logging of performance information. * Fixed compatibility with the glibc2.4 (and newer) on Linux. * The lock scheduler was revised to enhance scalability and performance with a large number of competing locks. * Improve bimport performance on master sets. * The server process was enhanced to use a more efficient format to record index and meta data changes. * The "conntime" session item was added to allow fwaudit to filter by the session connection timestamp (#3343). * The dbrecover utility was enhanced to support incremental recovery. * The dbrecover utility was enhanced to support recovery up to a specified point in time without previously switching the forward- log file. * The dblogreset utility was enhanced to retain a forward-log in case of a recovery after an abnormal termination of the database server. * Impose a limit on the max. transaction size Known issues ------------ The following issues are known in the current beta version: - Currently, only the HP-UX and Linux platforms are supported. - Due to a higher internal locking overhead, the B.08.00 database server may have a higher CPU usage than the B.07.10 eloqdb6. This locking overhead is determined by the preemptive multi-threaded architecture of the B.08.00 database server and cannot be avoided. As a consequence, a single database job may in some cases take longer with the B.08.00 database server as compared to B.07.10 and have a higher CPU usage. However, multiple simultaneous database jobs should scale significantly better than B.07.10 by making better use of the available system resources. Recent Changes -------------- User visible changes include: beta10: - Performance was improved by reducing the potential lock contention on a number of important objects. - Concurrent sequential acess to large data sets could result in a performance problem. - Some objects used to lock server statistics information being displayed on the HTTP statistics page turned out to have a high contention. These objects were removed. - The database server could abort with a message like below while executing a rollback: Assertion failed: Tlog_GetMeta() failed: file is corrupt Aborting on internal failure, file voltxn.c, line 5480 - The consistency of the transaction journal has been improved. Before a data volume is written, the transaction journal is synchronized so that a startup recovery will successfully execute. In this context, the method to determine whether or not it is necessary to synchronize the transaction journal has been improved for consistency in a multi-threaded environment. beta9: - Eloquence is installed in the /opt/eloquence/8.0 directory. Configuration files reside in the /etc/opt/eloquence/8.0 directory. The directory /var/opt/eloquence/8.0 is used for temporary storage. - HP-UX: /sbin/init.d/eloq8 is used as the start/stop script. The startup configuration file is maintained in the config file /etc/rc.config.d/eloquence8. - Linux: /etc/init.d/eloq8 is used as the start/stop script. The startup configuration file is maintained in the config file /etc/sysconfig/eloquence8. - The database server program has been renamed to eloqdb32 (the 32 bit version) or eloqdb64 (the 64 bit version), respectively. This change is transparent when using the eloq8 start script. - The default database server config file has been renamed to eloqdb.cfg and resides in the /etc/opt/eloquence/8.0 directory. Also see INSTALL file for manual changes needed when updating from Eloquence 7 or switching between Eloquence 7.10 and 8. - Eloquence 8 includes a set of updated database client libraries in /opt/eloquence/8.0/lib (or HP-UX specific subdirectories). To use those new library versions, user programs need to be either relinked or enabled to use the environment vars LD_LIBRARY_PATH or SHLIB_PATH (for PA-RISC programs on HP-UX). - A B.08.00 license key is required - Fixed a problem with DBUNLOCK which in rare cases could cause a panic with a message like below (#3551): Assertion failed: current_task->waiting_for == NULL Aborting on internal failure, file thread.c, line 1211 This problem was caused by a race condition that could occur on concurrently executed DBLOCK and DBUNLOCK invocations. A fix for this problem was already integrated into the previous beta8 release but turned out to be incomplete. - Under rare conditions it could happen that a record number was not reused (#3573). Due to an internal race condition during DBDELETE a deleted record number was sometimes not written into the list of free record numbers. The list of free record numbers can be fixed by running dbfsck in write mode. - The database server could output a message like below although there were sufficient free buffers available in the buffer cache: bf_newbuf: Cache buffers mostly in use - trying harder The buffers are organized in multiple queues. If by occasion the current queue was empty, this message was output, although there may have been buffers available in the other queues. - Due to an internal race condition, the database server could under certain conditions output an internal debug message like below: poll() returned 1, processed 0 - Due to an internal race condition when a session is shutdown, the database server could under certain conditions output a misleading warning message like below: BUG: Inconsistent fd mapping - Fixed a problem affecting a replicated slave server or a startup recovery or a recovery with dbrecover which in rare cases could cause a panic with a message like below (#3568): Assertion failed: h->lower == data->lower Aborting on internal failure, file btree.c, line 1866 The line number may differ. Besides the "lower" element, the failed assertion may also apply to the "prevpg" or "nextpg" or "upper" or "flags" elements. This could happen when replication was stopped and later resumed or when performing an incremental recovery with dbrecover. In theory it could also happen during a startup recovery when processing incremental btree recovery actions. If replication or recovery is restarted it needs to continue at the exact point it left off previously. The last checkpoint is recorded in the volume file. However, any changes beyond the last checkpoint need to be verified if they were previously applied. If similar actions affecting specific btree changes are found the replication or recovery could fail to correctly locate the point-of-resume in the forward log. This could happen, for example, on multiple DBDELETE / DBPUT sequences affecting an index in a way that the same btree page was modified identically multiple times. The implementation was changed to maintain additional information in the volume files on the replicated slave server or during recovery to correctly identify the last change applied. If this problem is encountered on a replicated slave server the slave server must be rebuilt from the master server volume files. When encountered during dbrecover, recovery must be restarted from the last backup after installing a corrected dbrecover binary (restarting dbrecover will not work). If this problem is encountered during a startup recovery, the volume files should be restored from the last backup and a forward recovery should be applied using the dbrecover utility. - Fixed a potential page leak in the log volume when a replication was stopped and later resumed. - In some cases a replication master server did not correctly update the recovery status in the root volume file. This could have the effect that a subsequent dbrecover would then assume a previously incomplete recovery and unnecessarily require a previous forward log file to be present. For example, an on-line backup is performed. This starts a new forward log generation. If these volume files are later used in a recovery, dbrecover applies forward log files from the generation started with the backup. However, as the recovery status was not correctly updated, it would instead expect the previous generation (and just skip it). This could only happen if the database server is configured as replication master server, not in standalone mode (default). - In rare cases DBFIND or DBGET on index items could return the status -804 (#3590). An internal index cursor invalidation could happen if an index traversal was interrupted by a concurrent commit or rollback on the same index. Under rare conditions this could result in an unexpected index cursor state and status -804 was returned to the application. - Fixed a problem with DBGET mode 5 or 6 on index items using packed decimal (P) or zoned decimal (Z) item types (#3574). This could have the effect that an improper end-of-chain condition could be returned to the application in some cases. With P or Z items, identical values may have a different binary representation. For example, the values 42 (unsigned) and +42 (positive) have the same numeric value but differ in their binary representation. Eloquence compares these items by their value, regardless of the binary representation. For example, when a P or Z item is used as search item, the values 42 (unsigned) and +42 (positive) use the same chain, they are considered identical. However, using DBGET mode 5 or 6 with a P or Z index item behaved differently. Although the underlying index correctly locates the entry by its value, a binary comparison was used on the result, causing an improper end-of-chain condition if a key value did not match the search argument in the binary comparison. beta8: - Scalability was improved by reducing the lock contention on the internal VNode structure. - Write performance was improved by significantly reducing the number of file system writes and this way reducing the lock contention on the transaction journal. - Fixed a problem with DBUNLOCK which in rare cases could cause a panic with a message like below (#3551): Assertion failed: current_task->waiting_for == NULL Aborting on internal failure, file thread.c, line 1211 This problem was caused by a race condition that could occur on concurrently executed DBLOCK and DBUNLOCK invocations. - A potential problem was fixed which could cause a btree to return wrong results due to a concurrently committed transaction. - Fixed a problem affecting a replicated slave server or recovery with dbrecover which in rare cases could cause a panic with a message like below (#3552): Assertion failed: h->pgno == data->pgno Aborting on internal failure, file btree.c, line 1808 This problem was caused by a defect in the btree recovery code that could result in a corrupted index page if a btree root page was split for the first time and the page was previously not in the buffer cache. If this problem is encountered on a replicated slave server the slave server must be rebuilt from the master server volume files. When encountered during dbrecover, recovery must be restarted from the last backup after installing a corrected dbrecover binary (restarting dbrecover will not work). - The builtin dbstore function (dbctl dbstore) was enhanced to create the store archive using more restrictive permissions (#3544). The builtin dbstore function now restricts access to the account running the db server process. - Fixed a problem which could result in unexpected "protocol failure" and -700:-6 error messages in dbrepl. In some cases a replication slave server process prematurely closed the connection of the dbrepl utility when encountering a problem. This could have the effect that the actual error message was lost and a generic error message was output. - On a replication slave the forward-log file permissions were not correctly set and the [forwardlog] GroupReadAccess configuration option had no effect (#3548). beta7: - Buffer cache locking was optimized to reduce lock contention on the internal LRU queue. - Fixed an internal deadlock condition that could occur when a user transaction was internally serialized after repeatedly retrying to resolve a lock conflict with concurrent database sessions. - Fixed a problem that caused the 32 bit database server to abort on startup if BufferCache is configured higher than 700 MB (#3485). The server terminated with a message like below: Unable to allocate buf_table(131071,8192,12) Memory allocation failed failed: Not enough space (errno 12) Unable to initialize buffer-cache subsystem. - Fixed a problem where the database server could hang in an endless loop on startup if the listen socket could not be bound (#3528). In this case, a message like below was output to the server log: Unable to bind address. [226] Address already in use - The dbctl killthread command did not have any effect if the thread to be killed is currently blocked waiting for a client request (#3530). - A problem in DBDELETE was fixed that could in some cases result in a noticeable delay (#3524). This delay could affect concurrent database sessions during write operations (DBPUT/DBUPDATE/DBDELETE) on the same data set. During DBDELETE, if the first or last record is deleted in a data set, the data set meta information was updated to reflect the new first or last record in the set. However, if there is a significant number of deleted records in the set that must be traversed to locate the new first or last record, this might take some time, subject to caching. The implementation was modified so that the first/last record meta information, as well as updating this information during DBDELETE, is no longer needed. This was achieved in a way that the data remains backwards-compatible with previous database server versions. Please note: If a previous dbfsck utility is used it may report first/last record meta data inconsistencies. Please consider installing patch PE71-0802121 which provides a new dbfsck version where the first/last record check is relaxed, according to the way the new database server implementation handles the first/last record meta information. - The database server was enhanced to support case insensitive indexes (#1073). When specified for an index, key comparison on strings is performed in a case insensitive manner. - Fixed a potential TurboIMAGE compatibility problem (#3502). A TurboIMAGE DBGET call that fails with a status code may return the current record number in the status array. The database server was modified to return the current record number in case a DBGET call fails with a status code. - Fixed a problem during incremental recovery or replication (#3515). An incremental dbrecover or replication could in rare cases fail with a message like below: Assertion failed: offset == data->offset Aborting on internal failure, file btree.c, line 2342 This was caused by an already modified btree page not being skipped during the synchronization phase of an incremental dbrecover or replication. In case this problem is encountered, the incremental dbrecover or replication will correctly continue after this patch is installed. - Fixed a corner case problem in btree error handling (#3052). If a btree page split fails due to lack of disk space a cache page might not be properly released. This may result in a subsequent server panic with a message like below: buf_Sync: PIN LEAK detected. bhp=40329c58, node=#116 Assertion failed: !(bhp->flags & BUF_PINNED) Aborting on internal failure, file mpool.c, line 926 - Fixed a problem in the volume file extension procedure that could result in growing a volume file infinitely until the disk space is exhausted (#3481). This could happen if the volume file size is limited below the current size by changing the VolumeFileSizeLimit config item to a value below the size of existing volume files. - The database server was enhanced to support a new configuration option to enable read access on forward-log files for the group (GID) specified in the database server configuration file (#3475). By default the database server creates any forward-log files with restrictive permissions that only allow the configured user (and the superuser) to access the forward-log files. The new [forwardlog] GroupReadAccess configuration option may be used to specify read access for the configured group to the forward-log files. # [forwardlog] # GroupReadAccess = 0|1 If set to a nonzero value forward-log files are created with a permission that allows group read access (configured with the [Server] GID option). If set to zero forward log files are created with a permission to restrict access to the owner (configured with the [Server] UID option). The default value is 0 to permit owner access only. - The STOP argument was added to the dbctl replication command on a replication slave (#3523): dbctl -u dba replication stop This disconnects the dbrepl process from the slave server. - The dbctl replication status output was modified (#3523). On a replication slave, it is now displayed in addition whether the replication is active (i.e., the dbrepl process is connected) or not. Please note: If you use external utilities that process the dbctl replication status output it may be necessary to adapt them to the new output format. - Fixed a problem where a configured StatFile was truncated during database server startup although StatFileFlags included the "a" (append) flag (#3536). - The internal SCAN_REC function was enhanced to gracefully handle a BOF status and to output a more detailed log message in case of a failure. beta6: - Fixed a performance problem caused by lock contention related to cache buffer aging with a large number of concurrent sessions. - Reduced system CPU utilization on lock contention. beta5: - Fixed a performance problem caused by lock contention related to cache buffer aging when a large number of processes repeatedly access the same cached page at a high frequency. - Fixed an internal race condition where an index page is accessed while another thread modifies the same page (#3476). This could result in a crash of the server with a log message like below: Assertion failed: bhp->refcount == 1 - Fixed a problem with DBFIND modes 6 and 7 with arguments using wildcards (#3429). These DBFIND modes are used internally by the TurboIMAGE compatibility library (image3k library) to implement TPI and index access. This problem was introduced with the beta4 release. beta4: - Added support for the Linux platform. - Fixed a performance problem on DBPUT/DBUPDATE/DBDELETE if too many threads simultaneously try to acquire the same resource. In this case messages like below were output to the server log: dbput: excessive retries on tx conflict (CUSTOMERS ) The new implementation solves this by detecting a retry due to a resource conflict and then serializing this access. This results in improved throughput with lower CPU utilization. - Fixed a problem that could result in a crash of the server process during replication or forward-recovery of a database restructuring with a log message like below (#3444): Assertion failed: meta->ulist_cache_used <= (int)node->node.ulist.num_pages - A session could hang during cleanup due to an internal deadlock condition (#3443). As a consequence, the server could no longer shut down gracefully and had to be killed. On the replication slave server this could cause a situation where subsequent dbrepl invocations fail with a -700:-6 status. - A dbctl dbbexp invocation caused an internal deadlock after completion (#3442). As a consequence, the server could no longer shut down gracefully and had to be killed. - A TurboIMAGE/SuperDex compatibility problem was solved (#3429). If the wildcard tokens (?, #, @) were used within a DBFIND search argument, using the "greater than (or equal)" or "less than (or equal)" conditions caused the server to return a status 53 (bad argument). A wildcard search argument could not be used to evaluate a greater/less comparison. The implementation was modified to support "greater/less than (or equal)" conditions on search arguments if leading literal characters are present in the search argument. In this case, the leading literal part is used to evaluate a greater/less comparison. - A TurboIMAGE/SuperDex compatibility problem was solved (#3000). After a DBFIND on an index using a "greater than" or "less than" condition, the DBGET modes 15 or 16 behaved as "greater than or equal" or "less than or equal", respectively. - Fixed a problem that could result in a crash of the server process during dbutil database restructuring with a log message like below (#3414): Assertion failed: bhp->refcount == 1 - Fixed a problem resulting in dbutil database restructuring to fail with an error message like below (#3412): *** Database in use [-2] FATAL: Fatal problem during schema upload - can't continue This problem was introduced as a side effect with the beta3 release (#3111). - Enhanced the dbstore/dbrestore operations to support canceling the operation (#3421). If the dbctl session is terminated the dbstore/dbrestore operation will now terminate as well. A message as below is logged by the server: Session was terminated - database not stored Session was terminated - database not restored - Database restructuring was changed to include additional progress information in the log file. For each completed step a message is logged. In addition a progress message is logged each 10 minutes (subject to the LogFlags setting). Messages like below are output to the server log: restructuring data set 'ARTIKEL': 49760 records processed rebuilding indexes for data set 'ARTIKEL': 76024 records processed relinking detail data set 'BUCHUNG' paths: 1094500 records processed - Added the option to limit database transaction sizes (#3388). Two size limits are implemented: A configurable "softlimit" and an internal "hardlimit". The minimum of either value defines the max. size an uncommitted transaction may have. The internal "hardlimit" is determined by the half of the configured log space and subtracting the configured checkpoint size: configured log space / 2 - configured checkpt size The softlimit is configurable with the new TransactionSizeLimit config item. By default it is set to half the size of the internal hardlimit. For example, assuming a size limit of 1 GB for the log volume and a checkpt size of 50 MB the hardlimit would be 450 MB and the default softlimit would be 225 MB. Once the size of an uncommitted transaction reaches or exceeds the limit a status -801:28 is returned. The only valid options at this point are to commit or rollback the transaction. If the status -801:28 is returned by the DBCOMMIT call the only valid option is to rollback the transaction. A message like below is logged to the server log: Transaction size limit exceeded, size: xxx pages, limit: xxx pages - The new [Config] TransactionSizeLimit config item may be used to configure a size limit for database transactions. It is defined as below: This configuration item may be used to limit the max. size of a database transaction in MB. If set to zero, the transaction size is not limited. If set to -1 (the default), the size limit is set to a default value which depends on the configured log volume space. The default value is -1. - Fixed a problem that could result in a crash of the server process if the server log flags were set to output debug and replication was active. beta3: - Fixed an internal deadlock condition that could happen in some cases when a record that spans a page boundary was accessed concurrently by multiple threads while I/O was pending (#3386). - Fixed a race condition that could cause a server abort in some cases when an index page was accessed while a concurrent session modified the same index page during commit. - Fixed a problem where a database could be purged after it was renamed although it was still opened by another session (#3111). - The item format flags in the node schema audit record was enhanced to indicate the role of an item as below: Bit 16 (0x10000) is set if the item is a search item. Bit 18 (0x40000) is set if the item is a unique key. Currently, this indicates it is a master search item. Bit 19 (0x80000) is set if the item is a sort item. beta2: - Fixed an internal cache corruption problem that could happen on DBDELETE of master records and subsequently cause various problems (#3390). - Fixed an internal hangup in the buffer cache if a page having a pending disk write was shadowed. - On the replication slave server, a DBCLOSE could wrongly resume the replication after a database is closed (#3383). - If forward-logging is disabled, the server now immediately stops writing to the forward-log (#3089). - Disabling forward-logging on the replication slave server could cause an internal inconsistency. beta1: - The order of actions in the forward-log file was not always consistent if entries were added by multiple sessions simultaneously. This could cause a crash during forward-recovery or replication. - The server could crash if concurrent sessions accessed the same btree page. - Added the DBLOCK-COMPAT database property. - Improved commit concurrency by removing an avoidable lock.