Eloquence B.08.00 ================= Revision: beta8, 2008-03-31 Thank you for your interest in the Eloquence B.08.00 beta8 release. This Eloquence beta release provides a development snapshot of the upcoming Eloquence B.08.00 version that has passed some limited QA process. By making the test versions available publicly we hope to encourage wider testing and additional feedback. Please contact support@marxmeier.com to share your feedback or report a problem. Please note: This release is available under the terms of the Eloquence Beta Test Agreement which is specified in the file AGREEMENT. http://www.marxmeier.com/eloquence/download/beta/B0800/AGREEMENT Downloading and installing the software indicates your agreement to the Beta Test terms and conditions. This beta release does not meet the release criteria for quality or performance and is only intended for test usage. If it breaks you get to keep the pieces. Introduction ------------ This beta version currently includes a preliminary version of the Eloquence B.08.00 database server. The major Eloquence B.08.00 database goals include: - scalability improvements - make use of available CPU resources (improved multi-threading) - improved scalability on large memory configurations - replication - release 64 bit version The beta current version is expected to be functionally complete. Subsequent releases will mostly focus on fixing problems and enabling performance enhancements. Requirements ------------ To use replication the following requirements must be met: - Eloquence B.07.10 must be installed - Installation of the most recent B.07.10 patches is required for full functionality. - Currently, only the HP-UX and Linux operating systems are supported. - HP-UX 11.11 (or newer) on PA-RISC 2.0 and HP-UX 11.23 (or newer) on IA64 based systems - Linux glibc2.3 (or newer) on x86 and x86_64 based systems supporting the NPTL based threading - To use replication, a separate replication license key is required. Please contact support@marxmeier.com. Installation ------------ The Eloquence B.08.00 beta releases are available for download from the following location: HTTP protocol: http://www.marxmeier.com/download/beta/B0800/ FTP protocol: ftp://ftp.marxmeier.com/eloq/beta/B0800/ To install, please extract the compressed tar file in the directory /opt/eloquence6. gzcat /tmp/B0800-beta8-hpux-pa20.tar.gz | tar xvf - This will replace the following binaries in the B.07.10 installation: bin/eloqdb6 bin/dblogreset bin/dbrecover bin/fwaudit The following B.07.10 patches (or superseding) are suggested to be installed: PE71-0801250 - db client library PE71-0801251 - image3k library PE71-0802150 - eloqcore PE71-0802121 - dbfsck utility Download location: http://www.hp-eloquence.com/download/B0710/patch/ http://www.hp-eloquence.com/download/B0710/patch/beta/ To use replication, the following B.07.10 patches (or superseding) MUST be installed: PE71-0705303 - dbrepl utility PE71-0705304 - chklic utility PE71-0705305 - dbvolextend utility PE71-0705306 - dbvolchange utility PE71-0801245 - dbcfix utility PE71-0802121 - dbfsck utility Download location: http://www.hp-eloquence.com/download/B0710/patch/beta/ B.07.10 compatibility --------------------- Eloquence B.08.00 is upwards compatible with previous Eloquence versions and binary compatible with recent B.07.10 patch levels. However, a crashed eloqdb6 can not be recovered by B.07.10. To revert to B.07.10 the following procedure is required: 1. If the database server was not shutdown cleanly, please run dblogreset to re-apply (roll forward) any recently committed transactions. 2. Re-install the B.07.10 binaries that were replaced by the B.08.00 beta release (see above). To do so, simply install the B.07.10 patches providing the latest version of that binary. HP-UX Kernel Parameters ----------------------- Eloquence B.08.00 may require the configuration of additional kernel parameters: max_thread_proc - defines the maximum number of concurrent threads allowed per process. max_thread_proc limits the maximum number of threads allowed per process on the system. max_thread_proc needs to be set to at least the highest number of configured database connections serviced by any B.08.00 database server (threads= config item) plus 10. nkthread - number of threads allowed to run simultaneously The nkthread tunable controls the absolute number of threads allowed on a system at any given time. Increasing it will allow more threads, and lowering it will restrict the number of threads. It can be determined that nkthread is too low when the kthread: table is full message is seen in the message buffer. The message can be read via dmesg or syslog. This message indicates that an application was unable to create a thread. nkthread needs to be set to at least the sum of all configured database connections serviced by an B.08.00 database server (threads= config item, system wide) plus 10 per instance. nkthread must be greater than max_thread_proc. Documentation ------------- At this point no separate B.08.00 documentation is available. However, as this beta release is expected to be mostly functional equivalent to the most recent B.07.10 patches (PE71-0803070 and related) the corresponding B.07.10 documentation applies as well. Database replication This document describes the Eloquence replication functions. http://www.hp-eloquence.com/support/B0710/doc/repl/index.html Database Server statistics This document describes the Eloquence Database Server statistics. http://www.hp-eloquence.com/support/B0710/doc/stats/index.html Summary of enhancements (relative to the initial B.07.10 release) ----------------------------------------------------------------- * All B.07.10 patches (as applicable) are merged to B.08.00 * Add support for case insensitive indexes * Add the option to replicate database transactions to other database environments. * The DBLOCK-COMPAT database property may be used to modify the locking policy for a database. This may be used to specify an IMAGE compatible lock behavior for a database. * A forward-log is now retained across a crash of the database server. * After a DBFIND on an index, a DBGET mode 4 could not be used to position the current record inside the result. * A DBFIND resets the current record number for a data set. * The server was enhanced to support additional options for logging server or individual session performance information. * The server was enhanced to support additional dbctl commands to allow dynamically changing the logging of performance information. * Fixed compatibility with the glibc2.4 (and newer) on Linux. * The lock scheduler was revised to enhance scalability and performance with a large number of competing locks. * Improve bimport performance on master sets. * The server process was enhanced to use a more efficient format to record index and meta data changes. * The "conntime" session item was added to allow fwaudit to filter by the session connection timestamp (#3343). * The dbrecover utility was enhanced to support incremental recovery. * The dbrecover utility was enhanced to support recovery up to a specified point in time without previously switching the forward- log file. * The dblogreset utility was enhanced to retain a forward-log in case of a recovery after an abnormal termination of the database server. * Impose a limit on the max. transaction size Known issues ------------ The following issues are known in the current beta version: - Requires installed B.07.10 release (along with recent patches). - Currently, only the HP-UX and Linux platforms are supported. - Due to a higher internal locking overhead, the B.08.00 eloqdb6 may have a higher CPU usage than the B.07.10 eloqdb6. This locking overhead is determined by the preemptive multi-threaded architecture of the B.08.00 eloqdb6 and cannot be avoided. As a consequence, a single database job may in some cases take longer with the B.08.00 eloqdb6 as compared to B.07.10 and have a higher CPU usage. However, multiple simultaneous database jobs should scale significantly better than B.07.10 by making better use of the available system resources. Recent Changes -------------- User visible changes include: beta8: - Scalability was improved by reducing the lock contention on the internal VNode structure. - Write performance was improved by significantly reducing the number of file system writes and this way reducing the lock contention on the transaction journal. - Fixed a problem with DBUNLOCK which in rare cases could cause a panic with a message like below (#3551): Assertion failed: current_task->waiting_for == NULL Aborting on internal failure, file thread.c, line 1211 This problem was caused by a race condition that could occur on concurrently executed DBLOCK and DBUNLOCK invocations. - A potential problem was fixed which could cause a btree to return wrong results due to a concurrently committed transaction. - Fixed a problem affecting a replicated slave server or recovery with dbrecover which in rare cases could cause a panic with a message like below (#3552): Assertion failed: h->pgno == data->pgno Aborting on internal failure, file btree.c, line 1808 This problem was caused by a defect in the btree recovery code that could result in a corrupted index page if a btree root page was split for the first time and the page was previously not in the buffer cache. If this problem is encountered on a replicated slave server the slave server must be rebuilt from the master server volume files. When encountered during dbrecover, recovery must be restarted from the last backup after installing a corrected dbrecover binary (restarting dbrecover will not work). - The builtin dbstore function (dbctl dbstore) was enhanced to create the store archive using more restrictive permissions (#3544). The builtin dbstore function now restricts access to the account running the db server process. - Fixed a problem which could result in unexpected "protocol failure" and -700:-6 error messages in dbrepl. In some cases a replication alave server process prematurely closed the connection of the dbrepl utility when encountering a problem. This could have the effect that the actual error message was lost and a generic error message was output. - On a replication slave the forward-log file permissions were not correctly set and the [forwardlog] GroupReadAccess configuration option had no effect (#3548). beta7: - Buffer cache locking was optimized to reduce lock contention on the internal LRU queue. - Fixed an internal deadlock condition that could occur when a user transaction was internally serialized after repeatedly retrying to resolve a lock conflict with concurrent database sessions. - Fixed a problem that caused the 32 bit eloqdb6 to abort on startup if BufferCache is configured higher than 700 MB (#3485). The server terminated with a message like below: Unable to allocate buf_table(131071,8192,12) Memory allocation failed failed: Not enough space (errno 12) Unable to initialize buffer-cache subsystem. - Fixed a problem where the database server could hang in an endless loop on startup if the listen socket could not be bound (#3528). In this case, a message like below was output to the server log: Unable to bind address. [226] Address already in use - The dbctl killthread command did not have any effect if the thread to be killed is currently blocked waiting for a client request (#3530). - A problem in DBDELETE was fixed that could in some cases result in a noticeable delay (#3524). This delay could affect concurrent database sessions during write operations (DBPUT/DBUPDATE/DBDELETE) on the same data set. During DBDELETE, if the first or last record is deleted in a data set, the data set meta information was updated to reflect the new first or last record in the set. However, if there is a significant number of deleted records in the set that must be traversed to locate the new first or last record, this might take some time, subject to caching. The implementation was modified so that the first/last record meta information, as well as updating this information during DBDELETE, is no longer needed. This was achieved in a way that the data remains backwards-compatible with previous eloqdb6 versions. Please note: If a previous dbfsck utility is used it may report first/last record meta data inconsistencies. Please consider installing patch PE71-0802121 which provides a new dbfsck version where the first/last record check is relaxed, according to the way the new database server implementation handles the first/last record meta information. - The database server was enhanced to support case insensitive indexes (#1073). When specified for an index, key comparison on strings is performed in a case insensitive manner. - Fixed a potential TurboIMAGE compatibility problem (#3502). A TurboIMAGE DBGET call that fails with a status code may return the current record number in the status array. The database server was modified to return the current record number in case a DBGET call fails with a status code. - Fixed a problem during incremental recovery or replication (#3515). An incremental dbrecover or replication could in rare cases fail with a message like below: Assertion failed: offset == data->offset Aborting on internal failure, file btree.c, line 2342 This was caused by an already modified btree page not being skipped during the sychronization phase of an incremental dbrecover or replication. In case this problem is encountered, the incremental dbrecover or replication will correctly continue after this patch is installed. - Fixed a corner case problem in btree error handling (#3052). If a btree page split fails due to lack of disk space a cache page might not be properly released. This may result in a subsequent server panic with a message like below: buf_Sync: PIN LEAK detected. bhp=40329c58, node=#116 Assertion failed: !(bhp->flags & BUF_PINNED) Aborting on internal failure, file mpool.c, line 926 - Fixed a problem in the volume file extension procedure that could result in growing a volume file infinitely until the disk space is exhausted (#3481). This could happen if the volume file size is limited below the current size by changing the VolumeFileSizeLimit config item to a value below the size of existing volume files. - The database server was enhanced to support a new configuration option to enable read access on forward-log files for the group (GID) specified in the database server configuration file (#3475). By default the database server creates any forward-log files with restrictive permissions that only allow the configured user (and the superuser) to access the forward-log files. The new [forwardlog] GroupReadAccess configuration option may be used to specify read access for the configured group to the forward-log files. # [forwardlog] # GroupReadAccess = 0|1 If set to a nonzero value forward-log files are created with a permission that allows group read access (configured with the [Server] GID option). If set to zero forward log files are created with a permission to restrict access to the owner (configured with the [Server] UID option). The default value is 0 to permit owner access only. - The STOP argument was added to the dbctl replication command on a replication slave (#3523): dbctl -u dba replication stop This disconnects the dbrepl process from the slave server. - The dbctl replication status output was modified (#3523). On a replication slave, it is now displayed in addition whether the replication is active (i.e., the dbrepl process is connected) or not. Please note: If you use external utilities that process the dbctl replication status output it may be necessary to adapt them to the new output format. - Fixed a problem where a configured StatFile was truncated during database server startup although StatFileFlags included the "a" (append) flag (#3536). - The internal SCAN_REC function was enhanced to gracefully handle a BOF status and to output a more detailed log message in case of a failure. beta6: - Fixed a performance problem caused by lock contention related to cache buffer aging with a large number of concurrent sessions. - Reduced system CPU utilization on lock contention. beta5: - Fixed a performance problem caused by lock contention related to cache buffer aging when a large number of processes repeatedly access the same cached page at a high frequency. - Fixed an internal race condition where an index page is accessed while another thread modifies the same page (#3476). This could result in a crash of the server with a log message like below: Assertion failed: bhp->refcount == 1 - Fixed a problem with DBFIND modes 6 and 7 with arguments using wildcards (#3429). These DBFIND modes are used internally by the TurboIMAGE compatibility library (image3k library) to implement TPI and index access. This problem was introduced with the beta4 release. beta4: - Added support for the Linux platform. - Fixed a performance problem on DBPUT/DBUPDATE/DBDELETE if too many threads simultaneously try to acquire the same resource. In this case messages like below were output to the server log: dbput: excessive retries on tx conflict (CUSTOMERS ) The new implementation solves this by detecting a retry due to a resource conflict and then serializing this access. This results in improved throughput with lower CPU utilization. - Fixed a problem that could result in a crash of the server process during replication or forward-recovery of a database restructuring with a log message like below (#3444): Assertion failed: meta->ulist_cache_used <= (int)node->node.ulist.num_pages - A session could hang during cleanup due to an internal deadlock condition (#3443). As a consequence, the server could no longer shut down gracefully and had to be killed. On the replication slave server this could cause a situation where subsequent dbrepl invocations fail with a -700:-6 status. - A dbctl dbbexp invocation caused an internal deadlock after completion (#3442). As a consequence, the server could no longer shut down gracefully and had to be killed. - A TurboIMAGE/SuperDex compatibility problem was solved (#3429). If the wildcard tokens (?, #, @) were used within a DBFIND search argument, using the "greater than (or equal)" or "less than (or equal)" conditions caused the server to return a status 53 (bad argument). A wildcard search argument could not be used to evaluate a greater/less comparison. The implementation was modified to support "greater/less than (or equal)" conditions on search arguments if leading literal characters are present in the search argument. In this case, the leading literal part is used to evaluate a greater/less comparison. - A TurboIMAGE/SuperDex compatibility problem was solved (#3000). After a DBFIND on an index using a "greater than" or "less than" condition, the DBGET modes 15 or 16 behaved as "greater than or equal" or "less than or equal", respectively. - Fixed a problem that could result in a crash of the server process during dbutil database restructuring with a log message like below (#3414): Assertion failed: bhp->refcount == 1 - Fixed a problem resulting in dbutil database restructuring to fail with an error message like below (#3412): *** Database in use [-2] FATAL: Fatal problem during schema upload - can't continue This problem was introduced as a side effect with the beta3 release (#3111). - Enhanced the dbstore/dbrestore operations to support canceling the operation (#3421). If the dbctl session is terminated the dbstore/dbrestore operation will now terminate as well. A message as below is logged by the server: Session was terminated - database not stored Session was terminated - database not restored - Database restructuring was changed to include additional progress information in the log file. For each completed step a message is logged. In addition a progress message is logged each 10 minutes (subject to the LogFlags setting). Messages like below are output to the server log: restructuring data set 'ARTIKEL': 49760 records processed rebuilding indexes for data set 'ARTIKEL': 76024 records processed relinking detail data set 'BUCHUNG' paths: 1094500 records processed - Added the option to limit database transaction sizes (#3388). Two size limits are implemented: A configurable "softlimit" and an internal "hardlimit". The minimum of either value defines the max. size an uncommitted transaction may have. The internal "hardlimit" is determined by the half of the configured log space and subtracting the configured checkpoint size: configured log space / 2 - configured checkpt size The softlimit is configurable with the new TransactionSizeLimit config item. By default it is set to half the size of the internal hardlimit. For example, assuming a size limit of 1 GB for the log volume and a checkpt size of 50 MB the hardlimit would be 450 MB and the default softlimit would be 225 MB. Once the size of an uncommitted transaction reaches or exceeds the limit a status -801:28 is returned. The only valid options at this point are to commit or rollback the transaction. If the status -801:28 is returned by the DBCOMMIT call the only valid option is to rollback the transaction. A message like below is logged to the server log: Transaction size limit exceeded, size: xxx pages, limit: xxx pages - The new [Config] TransactionSizeLimit config item may be used to configure a size limit for database transactions. It is defined as below: This configuration item may be used to limit the max. size of a database transaction in MB. If set to zero, the transaction size is not limited. If set to -1 (the default), the size limit is set to a default value which depends on the configured log volume space. The default value is -1. - Fixed a problem that could result in a crash of the server process if the server log flags were set to output debug and replication was active. beta3: - Fixed an internal deadlock condition that could happen in some cases when a record that spans a page boundary was accessed concurrently by multiple threads while I/O was pending (#3386). - Fixed a race condition that could cause a server abort in some cases when an index page was accessed while a concurrent session modified the same index page during commit. - Fixed a problem where a database could be purged after it was renamed although it was still opened by another session (#3111). - The item format flags in the node schema audit record was enhanced to indicate the role of an item as below: Bit 16 (0x10000) is set if the item is a search item. Bit 18 (0x40000) is set if the item is a unique key. Currently, this indicates it is a master search item. Bit 19 (0x80000) is set if the item is a sort item. beta2: - Fixed an internal cache corruption problem that could happen on DBDELETE of master records and subsequently cause various problems (#3390). - Fixed an internal hangup in the buffer cache if a page having a pending disk write was shadowed. - On the replication slave server, a DBCLOSE could wrongly resume the replication after a database is closed (#3383). - If forward-logging is disabled, the server now immediately stops writing to the forward-log (#3089). - Disabling forward-logging on the replication slave server could cause an internal inconsistency. beta1: - The order of actions in the forward-log file was not always consistent if entries were added by multiple sessions simultaneously. This could cause a crash during forward-recovery or replication. - The server could crash if concurrent sessions accessed the same btree page. - Added the DBLOCK-COMPAT database property. - Improved commit concurrency by removing an avoidable lock.