Eloquence Replication ===================== Revision: 2006-04-18 Introduction ------------ This patch adds the option to Eloquence B.07.10 to replicate database transactions to other database environments (local or remote). The Eloquence replication functionality is not specific to a database but performs a replication of a database environment. Any change to the database environment is performed by a master server and committed transactions are then transferred and applied to replicated servers ("slave servers"). Replication is unidirectional and performed asynchronously but typically is close to real-time if allowed by server performance and connection bandwidth. On the master server, committed transactions are saved in Eloquence forward log files. A utility is run to transfer and apply committed transactions on a slave server. Technically, the replication function performs a continuous incremental forward recovery on slave servers of committed transactions. Expected use of this function includes load sharing (e.g. use a replicated environment for reporting), hot standby (allow switching to a running server instance on case of a problem) and make replicated data available in branch offices that are connected through WAN lines. This patch is intended for testing purposes to verify the correct operation of the replication functionality and to collect feedback for further enhancements. Some planned functions are currently not fully implemented (see below). Requirements ------------ To use replication the following requirements must be met: - Eloquence B.07.10 must be installed - Currently, only the HP-UX and Linux platforms are supported - The patch enabling replication must be installed on both the master and the slave system. Patches mostly add new functionality executed on the the slave server but also implements limited changes to the code base executed on the master side. - The slave function is only enabled if a specific (repl) license key is present. - The master system needs sufficient disk space to hold any forward log files that are not yet replicated to the slave servers entirely. - Replication master and slave must share the same architecture. Limitations ----------- Any modification to a database environment that is not described through transactions will have the effect that the replication gets out of sync and any slave servers needs to be re-synchronized. This may be the result of using Eloquence off-line support utilities that could affect the database content. For example: - dbfsck (when used to perform database low level repairs) - dbcfix (when used to repair database chain linkage) Disabling forward logging on the master server for any reason (eg. due to lack of disk space or for administrative purposes) will have the effect that any slave servers need to be re-synchronized. The Eloquence bimport function (for data migration) only partially uses transactions (only meta data are covered by transactions for performance reasons). Consequently, the bimport function may not be used on a master server (this function is disabled when configured as a master server) and its use would require re-synchronizing any slave servers. Any configuration changes affecting the database volume files (e.g. adding a volume file or changing the volume file size limits) requires the same change to be applied to any replicated database environment. Future versions of the replication function should be able to detect this condition and stop replication unless the slave server is configured equivalently. However, the current release will result in an abort of the slave server. In case the master server aborts (e.g. due to an internal problem or due to a system crash when SyncMode is enabled) the replication should usually be able to continue once the master server was restarted and recovered sucessfully. During startup the master server (or a dblogreset) would attempt to recover any possibly missing records from the transaction journal. In this case the replication should be able to continue. Modification of database content on a replicated server (slave server) is not allowed in any case. Using Eloquence off-line maintenance utilities (such as dbfsck in write mode) to modify the volume files content will either corrupt the slave environment or cause the replication to stop (and requires re-synchronization). A recovery of the master server from backup (e.g. restoring the volume files from backup and running dbrecover) could corrupt a replicated server state or cause replication to fail unless the forward recovery continues to a point at least to the last slave server synchronization point. Known limitations ----------------- The following known limitations are present in the current version and are expected to be solved with subsequent releases: 1. A slave server currently does not support forward log files. Forward log files for recovery or auditing purposes are currently not supported on a slave server. If configured this function is ignored. 2. When a slave server is shutdown while in on-line backup mode any modified information temporarily saved in the log volume is discarded on the next server startup and needs to be re-submitted. The dbrepl utility is able to resynchronize a slave server in this case. 3. A slave server may have incomplete information on a system crash or due to an internal failure but the volume state may be reported as consistent. A slave server currently is not fully protected against partial written transactions on a system crash or server process abort. The dbrepl utility should be able to resynchronize a slave server in this case. 4. There is currently no procedure provided to automatically start or stop the replication function. The dbrepl utility needs to be started or stopped manually. 5. Shutting down or restarting the slave server will cause the dbrepl utility to fail and require a manual re-start. 6. The replication function currently does not detect conditions where the volume files are configured differently on a master server and a slave server (eg. adding a volume file or changing volume file limits). This will currently result in a failure on the slave server. 7. There is currently no procedure available to switch server roles besides changing the configuration files and restarting the server process. 8. The current version does not correctly handle cases where a database is accessed in a conflicting mode on the slave server (eg. a database is purged on hte master server while a report is printed on the slave server). Replication progress should be temporarily suspended in this case. 9. Exlusion through dbopen modes on the slave is currently not honored. A dbopen mode 8 (deny concurrent write) should suspend replication. 10.Replication functions are currently only available on HP-UX and Linux. Configuration ------------- The Eloquence replication functions add new configuration options for the server configuration file (eloqdb6.cfg): [Replication] Role = Standalone|Master|Slave RedicectWrite = server:service TmpDir = /tmp The Replication.Role configuration item defines the role of this server. If this configuration item is not present it will default to Standalone. If set to Master this specifies a master server. If set to Slave this specifies a slave server. Setting Role to Master has the effect that some operations that would result in replicated servers to become unsynchronized (such as bimport) are disallowed. In addition it will cause the master server to output additional information on opened databases. Role must be set to Slave for a replicated server. A server will reject any write attempt and enables the replication functions. The RedicectWrite configuration item may be used to specify the master server for a slave server. It specifies the server name or IP address and service name or port number, separated by a colon. When operating as a slave server and the RedicectWrite is defined some DBOPEN modes (modes 1,3,6) will transparently become redirected to the specified server (this will also add a note to the log file). Otherwise any attempt to perform a modification will fail. The TmpDir configuration item may be used on a slave server to specify a temporary directory that is used as a scratch storage for collecting and processing partial transaction information. It needs to provide sufficient disk space to hold the size of the largest transaction. It defaults to the /tmp directory. Example configuration on a master server: [Replication] Role = Master [ForwardLog] FwLog = /fwlog/fw-%N A master server needs to enable forward logging to files (managed by the server process) and should define a Role = Master. Example configuration on a slave server: [Replication] Role = Slave RedirectWrite = 194.64.71.28:8202 A slave server needs to define Role = Slave and may define RedirectWrite to bounce DBOPEN in write mode to the master server. The dbrepl utility ------------------ The new Eloquence dbrepl utility is used to replicate committed transactions from a master server to a slave server. This utility needs to be run with dba privileges. As the dbrepl utility reads the master server forward log files it should be run on the same system and should have approriate access rights to the forward log files of the master server (read-only access is required). Usage: dbrepl [options] [slave_server] options: -help - show usage (this list) -c cfg - configuration file name (master) -v - verbose -help - show usage (this list) -c cfg - master server configuration file -v - verbose, display progress -u name - user name (defaults to dba) -p pswd - password -S - synchronize on existing log, then exit The slave_server command line option specifies the host name or IP address and service name or port number, separated by a colon (e.g. 194.64.71.28:8202). The EQ_DBUSER and EQ_DBPASSWORD environment variables may be used as an alternative to the -u / -p options. The EQ_DBSERVER environment variable may be used to specify the slave server. The dbrepl utility contacts the slave server for its most recently synchronized checkpoint operation. It then locates this checkpoint in the fw log files of the master server and then submits any queued transactions that are not present on the slave server. Once the slave server is up to date, any subsequently committed transactions should be replicated close to real-time, subject to communication bandwidth. Replication should only place minor load on the slave server. For example: # dbrepl -c /etc/opt/eloquence6/eloqdb6.cfg -v -S :8604 R1: processing forward-log file: /data/fwlog/1-1 R1: found synchronization point with slave server ... R1: processing forward-log file: /data/fwlog/12-1 R1: slave server is up-to-date until 2006-04-18 18:54:17 To termporarily stop synchronization of a slave server, it is sufficient to stop the dbrepl utility. On next start it will continue from the previous point. Future versions of the slave server will also temporily suspend replication in some conflicting cases if a database is used on the slave server (e.g. is a database is erased on the master server while a report is printed ousing the slave server on the same database). Database locks are not replicated but some exclusion may be achieved through using DBOPEN modes (deny concurrent writes) in a future version. Starting (or re-synchronizing) a Slave server --------------------------------------------- To setup a slave server, forward logging needs to be configured on the master server. Also changing the Replicate.Role = Master is encouraged. Then a backup copy of the volumes files of the master server is transferred to the slave environment. This could also be a previous backup if the forward log files since this backup are present on the master server. On the Slave server, the server config file needs to specify Replicate.Role = Slave. The slave server needs to be started. Then on the master server, run the dbrepl utility to start replicating any committed changes to the server. After some time the slave server should become current and follow the master server closely. Changing roles of master and slave servers ------------------------------------------ The following procedures should be used to switch roles of any replicated server: - Stop the master server - Make sure replication hat transferred the most recent transaction. If the dbrepl utility is not active you may want to consider running dbrepl with the -S option. - Shutdown the slave server. - Change the master and slave configuration files and re-start the server processes. Please mase sure the previous slave server has forward logging enabled. This procedure ensures an orderly handover and allows to replicate changes from the new master server. In all other cases the new slave server must be started from the master server (see section above).