Eloquence Replication ===================== Revision: BETA4, 2006-06-26 Introduction ------------ This patch adds the option to Eloquence B.07.10 to replicate database transactions to other database environments (local or remote). The Eloquence replication functionality is not specific to a database but performs a replication of all changes to a database environment. Replication is unidirectional and performed asynchronously but typically is close to real-time if allowed by server performance and connection bandwidth. On the master server, committed transactions are saved in Eloquence forward log files. The dbrepl replication utility is used to transfer and apply committed transactions from the master to any replicated servers ("slave servers"). The Eloquence replication function does not perform regular database calls but replicates transaction results. This improves performance and allows covering any changes that are described by internal transactions, including administrative changes such as creation of databases, structural changes and restoring single databases. Technically, the replication function performs a continuous incremental forward recovery on slave servers of committed transactions. Consequently, any information that is not maintained by the database server is outside the scope of the Eloquence replication functions. Expected use of this function includes load sharing (e.g. use a replicated environment for reporting), hot standby (allow switching to a running server instance in case of a problem) and make replicated data available in branch offices that are connected through WAN lines. This patch is intended for testing purposes to verify the correct operation of the replication functionality and to collect feedback for further enhancements. Some planned functions are currently not fully implemented (see section on Limitations). To use the Eloquence replicatgion function, an additional "repl" license key is required. Please contact the Eloquence support team at support@marxmeier.com on any questions. Requirements ------------ To use replication the following requirements must be met: - Eloquence B.07.10 must be installed. - Currently, only the HP-UX and Linux platforms are supported. - The patch enabling replication must be installed on both the master and the slave system. Patches mostly add new functionality executed on the the slave server but also implements limited changes to the code base executed on the master side. - The slave function is only enabled if a specific (repl) license key is present. - The master system needs sufficient disk space to hold any forward log files that are not yet replicated to the slave servers entirely. - Replication master and slave must share the same architecture. Limitations ----------- Any modification to a database environment that is not described through transactions will have the effect that the replication gets out of sync and any slave servers needs to be re-synchronized. This may be the result of using Eloquence off-line support utilities that could affect the database content. For example: - dbfsck (when used to perform database low level repairs) - dbcfix (when used to repair database chain linkage) Disabling forward logging on the master server for any reason (eg. due to lack of disk space or for administrative purposes) will have the effect that any slave servers need to be re-synchronized. The Eloquence bimport function (for data migration) only partially uses transactions (only meta data are covered by transactions for performance reasons). Consequently, the bimport function may not be used on a master server (this function is disabled when configured as a master server) and its use would require re-synchronizing any slave servers. Any configuration changes affecting the database volume files (e.g. adding a volume file or changing the volume file size limits) requires the same change to be applied to any replicated database environment. Future versions of the replication function should be able to detect this condition and stop replication unless the slave server is configured equivalently. However, the current release will result in an abort of the slave server. In case the master server aborts (e.g. due to an internal problem or due to a system crash when SyncMode is enabled) the replication should usually be able to continue once the master server was restarted and recovered sucessfully. During startup the master server (or a dblogreset) would attempt to recover any possibly missing records from the transaction journal. If successfull the replication should be able to continue. Modification of database content on a replicated server (slave server) is not allowed in any case. Using Eloquence off-line maintenance utilities (such as dbfsck in write mode) to modify the volume files content will either corrupt the slave environment or cause the replication to stop (and requires re-synchronization). A recovery of the master server from backup (restoring the volume files from the backup and running dbrecover) could corrupt a replicated server state or cause replication to fail unless the forward recovery continues beyond the point the slave server was most recently synchronized with. For example, restoring a backup and running dbrecover on a partial set of forward log files and then starting the master server would have this effect. Known limitations ----------------- The following known limitations are present in the current version and are expected to be solved with subsequent releases: 1. When a slave server is shutdown while in on-line backup mode any modified information temporarily saved in the log volume is discarded on the next server startup and needs to be re-submitted. The dbrepl utility is able to resynchronize a slave server in this case. 2. A slave server may have incomplete information on a system crash or due to an internal failure but the volume state may be reported as consistent. A slave server currently is not fully protected against partial written transactions on a system crash or server process abort. The dbrepl utility should be able to resynchronize a slave server in this case. 3. There is currently no procedure provided to automatically start or stop the replication function. The dbrepl utility needs to be started or stopped manually. 4. Shutting down or restarting the slave server will cause the dbrepl utility to fail and require a manual re-start. 5. The replication function currently does not detect conditions where the volume files are configured differently on a master server and a slave server (eg. adding a volume file or changing volume file limits). This will currently result in a failure on the slave server. 6. There is currently no procedure available to switch server roles besides changing the configuration files and restarting the server process. 7. Replication functions are currently not available for Windows. Recent Changes -------------- The following changes were made since the beta3 release. * Fixed problem with slave server forward log files. In some cases the slave server forward log file could contain repeated data. * Re-starting the recovery was improved. The previous version caused the slave server to re-read the most recent fw-log file segment to synchronize. This is no longer necessary. * The dbrepl utility was modified to remove a repeated warning message. The following changes were made since the beta2 release. * Support for forward log files was added to the slave server. * Fixed a potential buffer overflow when redirecting a writing dbopen from a slave server. * Fixed a compatibility problem with older forward log files. * Fixed a problem with cached meta information on slave server possibly not updated. The following changes were made since the beta1 release. * Fixed a bug where a dbrestore would corrupt the configuration * Fix bug with dbopen mode 4 on slave not bounced to master * Changed btree logging format to improve incremental updates in some corner cases * Added initial support for detecting conflicting database use on the slave server. The access requirements for replication is recorded by the master server and applied by the replication process. The following conditions are currently recognized: - The replication requesting exclusive access to a database (eg. erasing a database or an application on the master requested exclusive access to the database) - An application opening a database on the slave server in mode 8 (deny concurrent write) If a database is currently in use by an application, the replication is temporarily suspended until the database is available. If a database is currently accessed by the replication an attempt by an application to open the database in a conflicting mode will fail with a database status. Configuration ------------- The Eloquence replication functions add new configuration options for the server configuration file (eloqdb6.cfg): [Replication] Role = Standalone|Master|Slave RedicectWrite = server:service TmpDir = /tmp The Replication.Role configuration item defines the role of this server. If this configuration item is not present it will default to Standalone. If set to Master this specifies a master server. If set to Slave this specifies a slave server. Setting Role to Master has the effect that some operations that would result in replicated servers to become unsynchronized (such as bimport) are disallowed. In addition it will cause the master server to output additional information on opened databases that allow the slave server to detect conflicts between replication and concurrent use. Role must be set to Slave for a replicated server. If configured as a slave server, any write attempt is rejected and the replication enabled. The RedicectWrite configuration item is only used on a slave server and may be used to specify the corresponding master server for a slave server. It specifies the server name or IP address and service name or port number, separated by a colon. If RedicectWrite is defined on a slave server some DBOPEN modes (modes 1,3,4) are transparently redirected to the specified server (this will also add a note to the log file). Otherwise, any attempt to open the slave server for writing will fail. The TmpDir configuration item may be used on a slave server to specify a temporary directory that is used as a scratch storage for collecting and processing partial transaction information. It needs to provide sufficient disk space to hold the size of the largest transaction. It defaults to the /tmp directory. Example configuration on a master server: [Replication] Role = Master [ForwardLog] FwLog = /fwlog/fw-%N A master server needs to enable forward logging to files (managed by the server process) and should define a Role = Master. Example configuration on a slave server: [Replication] Role = Slave RedirectWrite = 194.64.71.28:8202 A slave server needs to define Role = Slave and may define RedirectWrite to bounce DBOPEN in write mode to the master server. Using forward log with a slave server ------------------------------------- In the eloqdb6 configuration the only relevant configuration parameter is FwLog in the [ForwardLog] section. For slave server forward-logging it must be configured for automatic file management, that is, it must refer to a directory with sufficient disk space available and have the %N token in the filename part. All other configuration parameters located in the [ForwardLog] section are ignored (FwRecovery, FwOnFailure, FwMaxSize, EnableAudit and AuditOnly). On the slave server, the sequence of the forward-log files equals the file sequence on the master server. In other words, all the information contained in the master forward-log files are copied to files with the same generation and sequence numbers on the slave server. If a replication is suspended and later resumed, the slave server locates the last checkpoint in the existing forward-log files and continues to append new replication actions at that point. Therefore, in the end the slave server forward-log files will always equal the master server files. The dbctl forwardlog interface allows to manually enable/disable forward- logging and to query the current forward-log status. If forward-logging is disabled the slave server immediately stops to write to its forward-log files. Any information that is replicated after forward-logging was disabled will never be written to the slave server forward-log files (i.e., the forward-log is interrupted at that point). The disabled state is retained until forward-logging is manually enabled or the slave server is restarted. When forward-logging is manually enabled the slave server forward-log will typically not be written immediately. Instead, the slave server starts to write the forward-log when a new replication segment begins (i.e., when the master server begins a new forward-log segment). The dbrepl utility ------------------ The Eloquence dbrepl utility is used to replicate committed transactions from a master server to a slave server. This utility needs to be run with dba privileges. The dbrepl utility reads the master server config file to obtain the location and naming convention of forward log files. It then contacts the specified slave server and obtains the most recently synchronized checkpoint on the slave server. With this information the master server forward log files are searched to locate a synchronization point. dbrepl then submits any enqueued transactions from this point to the slave server. Once the slave server is up to date, any subsequently committed transactions should be replicated close to real-time, subject to communication bandwidth. Replication should only place minor load on the master and slave server once synchrization has been achieved. As the dbrepl utility reads the master server forward log files it should be run on the same system as the master server and should have approriate access rights to the forward log files of the master server (read-only access is required). Usage: dbrepl [options] [slave_server_addr] options: -help - show usage (this list) -c cfg - configuration file name (master) -v - verbose, display progress -u name - user name (defaults to dba) -p pswd - password -S - synchronize on existing log, then exit The -c option is used to specify the master server config file. If not present, it defaults to the default config file on the local system (/etc/opt/eloquence6/eloqdb6.cfg for HP-UX and Linux). The slave_server_addr command line option specifies the slave server host name or IP address and service name or port number, separated by a colon (e.g. 194.64.71.28:8202). The host name or IP address may be omitted and defaults to localhost (127.0.0.1). The EQ_DBSERVER environment variable may be used to specify the slave server address. The EQ_DBUSER and EQ_DBPASSWORD environment variables may be used as an alternative to the -u / -p options. By default, dbrepl synchronizes all enqueued changes and then closely follows any on-going changes on the master server. If the -S option is present, dbrepl exists once all enqueued changes are synchronized. For example: # dbrepl -c /etc/opt/eloquence6/eloqdb6.cfg -v -S :8604 R1: processing forward-log file: /data/fwlog/1-1 R1: found synchronization point with slave server ... R1: processing forward-log file: /data/fwlog/12-1 R1: slave server is up-to-date until 2006-04-18 18:54:17 To termporarily stop synchronization of a slave server, it is sufficient to stop the dbrepl utility. On next start dbrepl will continue from the previous point. Replication is temporily suspended if a database is used on the slave server in a conflicting mode (e.g. is a database is erased on the master server while a report is printed ousing the slave server on the same database). Database locks are not replicated but some exclusion may be achieved through using DBOPEN mode 8 (deny concurrent writes). As described above, this will temporarily suspend replication. Starting (or re-synchronizing) a Slave server --------------------------------------------- To setup a slave server, forward logging needs to be configured on the master server. Also changing the Replicate.Role = Master is encouraged. Then a backup copy of the volumes files of the master server is transferred to the slave environment. This could also be a previous backup if the forward log files since this backup are present on the master server. On the Slave server, the server config file needs to specify Replicate.Role = Slave. The slave server needs to be started. On the master server, run the dbrepl utility to start replicating any committed changes to the server. After some time the slave server should become current and follow the master server closely. Changing roles of master and slave servers ------------------------------------------ The following procedures should be used to switch roles of any replicated server: - Stop the master server - Make sure replication has synced the most recent transaction. If the dbrepl utility is not active you may want to consider running dbrepl with the -S option. - Shutdown the slave server. - Change the master and slave configuration files and re-start the server processes. Please mase sure the previous slave server has forward logging enabled. This procedure ensures an orderly handover and allows to replicate changes from the new master server. In all other cases the new slave server must be started from the master server (see section above).