IBO Replication Module Online Help

FAQ	Help On-line	Trustware License	News	E-Mail
Join IBO List	List Archive	What is IB Objects?		Downloads
Events	Tech Info Sheets	IBO Community	RPL Page	Home

Replication Module Online Help

Simple Replication Module

Introduction

How IBO Replication Works

Steps

Behind the Scenes

Watching Replication Work

Custom Conditions

Introduction

Column Mappings

Row and Column Expressions

Custom Triggers

Configuring Procedures

A Replication Sample: 1

A Replication Sample: 2

Appendix

Installing the Blob UDF

Introduction

A 'Simple' Replication Module
------------------------------------------------------------------

Requirements for Using this Tool

1. For development and working with the supplied project source, you will need IB Objects version 4.x or higher.

2. Whether you are using the compiled version of the tool or the source, you will need the special UDF library BlobCompare. Instructions for obtaining and installing this library for Windows and Linux servers are in the Appendix.

Note also that Novell servers do not support UDFs.

The tool is mainly to demonstrate one of two specialized IB Objects components which have been developed to enable replication from one database to another. However, you can use this tool as a convenient "desktop" in which to set up and experiment with IBO replication.

Currently, replication is "one-way", that is, it will replicate DML changes from tables in the source database to tables in the target database according to the rules you set up for it. Two-way replication is possible in a future revision.

I call this a simple replication module but it does allow quite a large degree of flexibility.

You don't have to replicate every column. Limiting both the rows and the columns to be replicated is straightforward.

It gracefully handles key changes.

Changes can be queued for performing row-by-row replication during idle CPU time on the server.

A full resync can be done with or without queuing.

The tool is "simple" insofar as it currently does not attempt to alter any metadata other than those it creates itself for implementing replication. It is assumed that the developer will do any necessary setting up in the target database to accommodate the impact of the changes and the initial loading. For example, you may need to run a one-time or regular script to de-activate indexes and triggers at load-time or during full resyncs.

Automating some of the really tricky stuff, to make your replication strategy generic across all of your databases, is your job, as the programmer. All of the functionality you need for implementing your automation plan is available via components. You are assisted by a useful macro engine for handling parameters and chunks of sometimes complex code in generic ways.

Think of the tool you are using merely as a sample application written using the components. The application's form has only about 500-600 lines of code. The rest is encapsulated in the components.

How IBO Replication Works

[Previous] [Main] [Next]

Replication rules are stored in the target database

Storing the replication rules in the replication (target) database makes the programming task concise and portable. A system table in the target database stores the replication index definitions, names of source and target tables, key definitions and miscellaneous other information. A difference (DIFF) table is also maintained there for logging changes.

Queuing

Having stored procedures for sending and receiving replication notifications does not mean that databases have to be on-line all the time. The source-side procedures log DML changes ("replication actions") into a queue table in the source database. Records wait there until your service application processes them into the target database. This "item-by-item" resynchronization is the normal mode of replication.

Full (Re)synchronization

Full (re)synchronization of the target database with the source is an occasional requirement. It occurs at initial loading and periodically afterwards for housekeeping and verification. A full resync does not use the data in the queue table but reconciles the current state of the source data with that of the target.

Blobs and Arrays

The queue table carries flags which will cause a BLOB column to be ignored in a queue resync if no changes have impacted it. By this means, item-by-item replication of rows will not pass an unchanged blob over to the target database.

A full resync screens blobs on the source and target sides and passes blob data across only if a segment-by-segment compare indicates a change.

Arrays currently cannot be replicated with this tool.

Components

IBO supports this style of replication with two components.

TIB_RPL_Sync does the work of queuing the changes from the source database and synchronizing the two databases. As mentioned previously, in its simple form, IBO replication only synchronizes the target using input from the source. There is already keen interest in making it work both ways.

TIB_RPL_Meta, a descendant of TIB_RPL_Sync, encapsulates the creation and management of metadata in the target database, drawing on metadata attributes from the source database.

Notifications Between Databases

Stored procedures are used on both source and target to send and receive replication notifications. Because notifications can be both item-by-item and in a mode that effects a full resynchronization, there are four procedures altogether.

Think of this system as having two data pipelines between source and target. One pipeline is for a full resync. The other is for item-by-item synchronization from a queue. Both are bottled up into the TIB_RPL_Sync component.

The initial set-up checks for any data already in the target table(s). If not, it populates the table(s) directly using INSERT statements. If data is already there, it simply does a full resync.

Unlike some other replication models, where every table logs to another table with column-level entries for every change that occurs, and later "walks the entries" to eliminate all but the most recent for each changed row, this model logs only the old and new key values.

At sync time, like the data-logging model, it begins by eliminating unhandled inserts and edits from the queue. Then, instead of using logged data to construct DML statements for the target database, it uses the latest logged key to query the most recent version of the source directly. As well as guaranteeing snapshot synchronization, this strategy deals very elegantly with BLOB and ARRAY data which are problemmatical for the data-logging model.

A full resync causes changed rows to be logged to the DIFF table. After a full resync has been done, the new DIFF rows can be inspected. Item-by-item syncs are not logged.

Replication is not duplication

While mirroring is possible, it is not usually necessary. It is possible to main fully customized data structures on the target because stored procedures are performing the DML on both sides. The source can be limited to only a subset of and/or a subset of columns.

About This Module

This application provides a working demo implementing the IBO replication components. It will be useful also as a tool for setting up replication, possibly as a starting point for your own, more sophisticated service application.

It provides the means to supply the source and target databases each with its own connection. Each connection has its own utility bar, bringing many of the common IBO utilities to the tool's desktop, including the scripting tool and the new TIB_DDL_Extract component to make it simple to duplicate the source metadata in a new target database.

Bomb-proofing
Because the accuracy and timeliness of replication are crucial, your service application should incorporate plenty of bomb-proofing. This particular service application handles connections that are temporarily lost. All replication actions are performed within a single, two-phase commit transaction spanning both databases, making it is a very secure model.

As you will discover, the tool steps you through the tasks in a logical order, enabling each step upon successful completion of the previous one.

Steps

[Previous] [Main] [Next]

Following are the basic steps for configuring one table in one source database for a simple replication service. Trace through these steps one at a time to begin with, to have the module do its default, automatic stuff.

1 - Prepare the Source and Target Databases

The Source Database

It is recommended that you practise first using a copy of your database, to make it easy to bury your mistakes and start afresh! This tool favors the situation where you want to create the destination tables at setup and configuration time. In theory, replication with already populated target tables should work. It has been tested to some degree already but it has not been drilled really hard.

IMPORTANT :: An essential preliminary step is to add the library containing the User-Defined Function (UDF) FN_BLOBS_EQUAL to your database. Please refer to the Appendix for details of how to get and install this library and declare it to the database.

The Target Database

Before you start, create the new, empty replication target database.

Connecting

When you are ready, connect to the two databases.

If your source database uses UDFs and/or domains, you can use its Extract tool to get these pieces from its metadata and paste them into the Script Editor window of the target database. Execute the script and commit it and you are ready to start with replication.

With both databases "in line", you can proceed to create and load the new system table: click the

button.

Step 2 - Configuring and Loading Indexes for Replication

The configuration page appears:

To start simply, focus on the index data entry panel at the left and the top panel of the configuration display. You can ignore the other panels (expressions, etc.) for the time being. They are for refining either rows or columns.

Add an index name for one table in your source database and type in the source table name.
The name you enter in this field is tokenized as <<SRC_TBL_NME>>.

The purpose of the Create Table flag is to have the application recognize objects it has created itself, for use later in case metadata are to be dropped. Leave it checked, unless the destination table already exists in the target.

Click the Post button...

Observe that the tool has populated the other three fields:

Source Key Columns is a comma-separated list of the columns that form the primary key of the source table. It is tokenized as <<SRC_KEY_CLS>>.

Target Table Name is the name that will be given to the replication table in the target database if Create Table was checked True. It is tokenized as <<DST_TBL_NME>>.

Target Key Columns is a comma-separated list of the columns that form, or will form, the primary key of the target table. It is tokenized as <<DST_KEY_CLS>>.

3 - Load the Metadata

Now, to load the metadata for this table, click the

button...

4 - Activate the Index

Now, just click the next button to activate the index and it is done.

As you define a replication index, various domains, tables, triggers, stored procedures and generators are created. They all start with the
prefix RPL$ to distinguish them from your own metadata objects. (The Browser of the IB_WISQL tool gives you the option to filter them out..).

Note that metadata are not replicated. Just data.

Behind the Scenes

[Previous] [Main] [Next]

On the Target Side

A full sync is done initially, while configuring your replication strategy and getting the tactics set up. It takes some time to complete. In production, you would only do a full sync as a housekeeping chore, when you want to verify the consistency of the data across both databases.

On the target table, a system column (RPL$SYNC_ID) has been included in the table structure, for recording the serial number of a full resync. A new serial number is generated each time a full resync starts and target rows which are updated by the resync get "stamped" with this new serial number. On completion, any rows remaining in the target table with a Sync_ID lower than the new serial number will get deleted (unless another rule exists determining that such rows must be retained (see Custom Conditions: Conditions for Deletion).

On the Source Side

The Queue
The source-side procedures log replication actions into a queue table (RPL$R_MEMBERS$Q in the case of this particular table). Records wait there until your service application processes them into the target database.

Macro Processing

This tool introduces an effective macro processing engine to reduce the amount of construction you have to do on a table-by-table basis. A non-definitive list of macros appears near the end of this document.

Watching Replication Work

[Previous] [Main] [Next]

You have plugged in all the parameters. Pushing the

button caused the components to execute all of the necessary DDL statements to create tables, triggers, procedures, generators, etc.

Once those were all in place, pushing

populated them with data. Once this population and activation step has taken place, replication will occur if you are using either the TIB_RPL_Sync or TIB_RPL_Meta component. (Once everything is configured, your client service app won't need the extra overhead of TIB_RPL_Meta).

Use the Tools

Use the tools in this utility to watch what happens as you configure and test your replication indexes.
During your configuration steps, use the Browse dialogs

on the source and target panels to view the metadata and data as they are created. Keep the Sync Log (third tab) in view.
It is worthwhile opening the SQL Monitor dialog

as well, to keep an eye on the statements that are being passed.

As you make changes in the source and refresh the target you will be able to observe that it is indeed replicated.

Testing and Checking

Try making some changes in the replicated data in order to corrupt it. Then, try pushing the Resync Index button on the Sync Log tab. You will notice that the corruptions are corrected.
Notice that when a full resync is done, changes are recorded in the SYNC_DIFF table. Use the target's Browser dialog to look at the table RPL$SYNC_DIFF.

On the Sync Events Log tab and you can tell it to use event alerters or you can click a button to force it to check. It looks in the Q table on the source database for entries and processes them. As each item is processed the item in the Q is deleted. Everything is done in a single two-phase commit transaction, making things very secure.

The first time the data is loaded after the metadata is loaded, a check is done to see whether there is already data at the target.

If there is no data, the target table will be directly populated from the source using an INSERT statement.

If the table has some data in it already, a standard total resync is done and all differences are logged in the DIFF table created on the target. This is where you can always look to see the changes that were needed when a total resync has been done..

Full (Re)synchronization

The full resync after initial loading is done so that you can verify that nothing got corrupted in the replication process. For example, you may want to sample the target table's data to ensure that no data got changed or deleted unexpectedly or that no errors showed up in any custom processing you added.

The full resync uses a special column added to the target table called RPL$SYNC_ID. Its purpose is to serialize the synchronizations from a generator which is incremented each time a full resync starts.

It fetches the entire dataset from the source as it should be on the target. As each row comes from the source procedure it is executed through a procedure on the target side. If a row doesn't exist it is added and logged. If it does exist and there is a difference, it is updated and the old values are logged. The updated row is given the new Sync_id, even if it is the same.

Once it has gone through all the rows, it will delete and log each item whose sync_id was not updated to the new value that was generated when the sync process started. Those rows which have matches in the source dataset get the updated sync_id; those which don't get matched keep their old sync_id and get bumped out.

A faster approach to a full sync could have been done if it could have been guaranteed that both the source and the target datasets would have exactly the same ordering. Because I don't want replication to enforce restrictions that would not otherwise be necessary, I chose to compromise slightly on efficiency in favor of more flexibility. I justify this for InterBase/Firebird because its excellent transaction support will tend to make frequent full resyncing unnecessary.

Item-by-Item Synchronization

As changes take place in the source table, triggers fire that populate the queue table with notifications. A notification consists of an action type (insert, update or delete), a unique id for the queue item, as well as the old and new values of the key columns (because I wanted to allow for changes in key values).

When a notification is inserted into the queue table an event is triggered. The sync client, which is listening for those events, picks them up and processes the queue till it is empty.

The client does not fetch from the queue table directly. In order to provide plenty of options for building custom datasets to send to the target, it uses a SELECT procedure that derives an output set from on the queue table.

As you work through this document, you will see how powerful this approach can be in designing for the most complex replication requirements.

Introduction

[Previous] [Main] [Next]

Custom Conditions

If your requirements necessitate setting special conditions to limit the columns or rows selected to exist, be updated and/or be retained in the target database, the replication engine allows plenty of flexibility.
Several more fields can be configured for more complex, customized situations where you want something more refined than simply duplicating rows and columns from the source table to the target table.

This configuration takes place on the tab labelled 'Configure and Load Replication Indexes', which becomes visible after pressing the

button on the initial (Connection) tab. The fields concerned are distributed across a set of tabs within this page.

Macro Tokens

By now, you will have figured out that the entries in the configuration fields are really just snippets of SQL. They are tokenized as constants enclosed in double brackets, e.g. <<SRC_TBL_NME>>. As you will learn, these tokens can often be embedded inside other snippets. At various stages in the construction of DDL and DML by methods of the components, these tokens are passed to a macro engine. The engine - a dedicated parser routine - uses the corresponding snippets in the transformations that result in complete DDL scripts for objects and procedures or, at run-time, in SQL parameters for the procedures.

In the Appendix is the DDL that was generated by this module from the configurations demonstrated in the two sample topics (Replication Sample 1 and Sample 2).

TIP --> Use a Text Editor!
Obviously, entering complex expressions directly into these fields is not very convenient. It is recommended that you construct your entries to your satisfaction using your favourite text editor and simply paste them into the configuration fields. As this tool evolves, the interface will be made more usable.

Column Mappings

[Previous] [Main] [Next]

This tab is found on the Configure and Load Replication Indexes page:

Source Data Columns

If you don't want to duplicate every column from the source table into the target table, enter an SQL SELECT statement without the SELECT keyword. GROUP BY and HAVING can be included if you want to aggregate into the target column. WHERE and ORDER BY keywords can not be included.

e.g.

NameCode, FirstName||' '|| Surname AS ConcatName, DateOfBirth

Target Data Columns

If the output columns of the SELECT from the source table will not match the columns in the target table, name all of the target table columns here in a comma-separated list from left to right in the target table column order. The string should look like the column list portion of a CREATE TABLE statement, e.g.

NameCode varchar(15), FullName varchar(45), BirthDate Timestamp

Target to Source Map

If there are entries in Source Data and Target Data, then map the corresponding columns with the Target table column names of the left, e.g.

NameCode=NameCode

FullName=ConcatName

BirthDate=DateOfBirth

Target Table Source - <<DST_TBL_SRC>>

If you want to limit the columns which are replicated from the source table, you can optionally supply the portion of the CREATE TABLE statement that defines the columns for the target table, into the Target Table Source (DST_TBL_SRC) field:

The syntax is the same as for defining the fields in a CREATE TABLE statement. For example, if you were doing a phone list and wanted just name and phone number you would do this in the field:

EMP_NO INTEGER NOT NULL,

LAST_NAME VARCHAR( 30 ) NOT NULL,

FIRST_NAME VARCHAR( 30 ) NOT NULL,

PHONE VARCHAR( 20 )

Row and Column Expressions

[Previous] [Main] [Next]

This tab is found on the Configure and Load Replication Indexes page:

Source Inclusion Expression - <<SRC_ADD_EXP>>

The Source Inclusion Expression field (SRC_ADD_EXP) can be left blank if you want replication of all rows from the source database. For limiting replication to just certain rows, this field is where you put the expression that sets the restricting conditions for selecting those rows. The conditions must be true for a record to be moved to the target database.

For example, my customer has a database listing all trade names, whether active or inactive. We want to search only active registrations. These are replicated to a separate database so that the Full Text Search stuff can be put on it without bloating the original database. Thus there is a rule to only replicate active records.

To accomplish that, I put this into the Source Inclusion Expression field:
<<EXP>>NIR_STATUS = 'A'

When the <<EXP>> macro token is embedded before the SRC_ADD_EXPR snippet, it indicates that the macro expander engine is to swap in NEW. and OLD. as appropriate when generating the DDL for the trigger.

Refer to the Macro Substitutions topic for macro tokens that can be placed in other fields.

Source Updated Expression - <<SRC_UPD_EXP>>

To set up the conditions so that replication only happens if any of these columns changes, while changes to different columns will be ignored, supply an expression in the Source Updated Expr (SRC_UPD_EXP) field that detects only the relevant changes:

The following code excerpt, matching the syntax of the conditions within a WHERE clause, is pasted into the field:

( new.emp_no <> old.emp_no ) or

( new.last_name <> old.last_name ) or

( new.first_name <> old.first_name ) or

( new.phone <> old.phone ) or

(( new.phone is null ) and ( old.phone is not null )) or

(( new.phone is not null ) and ( old.phone is null ))

The effect of this snippet will be to exclude from the replication set all source rows that don't meet any of these conditions, even if there are changes in other columns.

Key Changes
Regardless of the update inclusion/exclusion condition, rows where key changes occur will always be included, unless they were subsequently deleted.

Target Exclusion Expression - <<DST_DEL_EXP>>

The default Delete behavior during replication is to delete target rows that match deletions in the source table. During a full resync, any target rows that have no corresponding source rows will be removed.

In some situations, you may want the non-matching rows retained. For example, you may have more than one source database writing to the same target.

To override this default deletion behavior, you can supply conditions to restrict the rows affected by this automatic deletion. The Target Exclusion Expr (DST_DEL_EXP) field is used:

To elucidate the concept of retaining rows because their persistence in the target database is required for some other purpose, the target table in the second example is a mix from two separate sources. An external stored procedure synchronizes another database with my target via a custom TCP/IP messaging protocol and maintains its records in the same target table.

A condition is set so that the WHERE clause of the DELETE statement will operate only on target rows that have a certain code value which is present in a reference table (NIR_TYP):

EXISTS ( SELECT *

FROM NIR_TYP nt

WHERE nt.NIR_CODE = <<EXP>>NME_CODE )

The rows that "belong" to the other system are selected by an equivalent existence check on a different reference table:

...

AND ( EXISTS ( SELECT *

FROM CCM_TYP nt

WHERE nt.NME_CODE = n.NME_CODE ))

...

Custom Triggers

[Previous] [Main] [Next]

This tab is found on the Configure and Load Replication Indexes page. Here is where you define the triggers for the source table that will cause rows to be added to the Send Queue.

Custom Declare Vars

This is for declaring any custom variables that will be used in your triggers. The same variables will be used in all of the triggers.

Custom Trigger Code

Write triggers as usual. Macros tokens can be utilized wherever applicable. This will be particularly useful where trigger code for the different DML operations is very similar. You may wish to use macro tokens to make your set of triggers quite generic so that it can be reused for other tables.

Configuring Procedures

[Previous] [Main] [Next]

Custom Variables and Procedures

This tab is found on the Configure and Load Replication Indexes page. Here is where you configure the stored procedures that will be generated in the source database to create the SQL streams for performing synchronization.

Here is where you can define procedures to perform any custom transformation, queueing and synching you need your replication service to do.

The first field (SRC_PRC_DEC_VAR) is for declaring any custom variables that will be wanted by your source-side procedures.

The same variables are available to both procedures. The syntax is the same as any DECLARE VARIABLE statement in a stored procedure, e.g. from Sample 1,

DECLARE VARIABLE tmpLASTNAME VARCHAR( 60 );

DECLARE VARIABLE tmpFIRSTNAME VARCHAR( 60 );

DECLARE VARIABLE tmpMIDDLENAME VARCHAR( 60 );

The replication components can access and expand your declared source procedure variables through the macro <<SRC_PRC_DEC_VAR>>.

The next field (SRC_PRC_INIT_VAR) is for entering procedure language statements to initialize the source procedure variables:

Sample 1 has this statement to do that initialization:

tmpLASTNAME = NULL;

tmpFIRSTNAME = NULL;

tmpMIDDLENAME = NULL;

The replication components can access your initialization statements through the macro <<SRC_PRC_INIT_VAR>>.

The 'Custom Send-Q Procedure' (SRC_PRC_Q_SRC) can be a simple or complex procedure chunk to perform data transformations, query other tables for parameters, whatever is needed for your custom SQL to process the send queue to select rows for replication and pass exactly the right replication across to the target.

In this field you define the processing for one row to be replicated. At configuration time, you don't know the criteria for the item-by-item configuration, since it will depend on what is in the queue table. The template will take care of embedding it inside the appropriate FOR..SELECT loop for you so you should not include it in the procedure.

You can embed macros inside your procedures and make them sufficiently generic to apply to many of your replications without a lot of extra code. The custom send-queue procedure itself is tokenized as the macro <<SRC_PRC_Q_SRC>>.

This simple example from sample 1 shows a macro token embedded in a procedure chunk:

FOR SELECT n.AGENT_ID, n.LASTNAME, n.FIRSTNAME, n.MIDDLENAME

FROM <<SRC_TBL_NME>> n

...

The parser routine MacroSubstitute() takes the content of the token and replaces it with the value configured in the source table name field in the first panel.

Likewise, in the 'Custom Send Full Procedure' box you can define a simple or complex custom source-side selection procedure chunk for the full resync (SRC_PRC_SRC_FULL).

For this procedure chunk, you know what the conditions will be for a full selection and you need to include the enclosing FOR..SELECT..DO..SUSPEND loop in order to process the full table. This procedure is tokenized in the macro <<SRC_PRC_SRC_FULL>>.

Both of the following samples have examples of custom send-queue and full-resync selection procedures.

A Replication Sample: 1

[Previous] [Main] [Next]

The following is the configuration of a replication index for an application I did for a customer.