One of my reader had requested to write post on this topic. I have gone through some posts and prepared a step by step process to demonstrate how we can configure Database Replication in SQL.
I will be covering the topic in couple of posts, this is first post in this series.
Today I will be covering basic theory behind the topic and then move to practical approach in my next post.
Brief extract of Topic
Database replication can be done in at least four different ways:
- Snapshot replication: Data on one server is simply copied to another server, or to another database on the same server.
- Merging replication: Data from two or more databases is combined into a single database.
- Transactional replication: Users receive full initial copies of the database and then receive periodic updates as data changes.
- Peer-to-Peer publication: Peer-Peer publication enables multi-master replication. The publisher streams transactions to all the peers in the topology. All peer nodes can read and write changes and the changes are propagated to all the nodes in the topology.
A distributed database management system ensures that changes, additions, and deletions performed on the data at any given location are automatically reflected in the data stored at all the other locations. Therefore, every user always sees data that is consistent with the data seen by all the other users.
SQL Server replication is based on the “Publish and Subscribe” metaphor. Let us look at each of the individual components in detail.
- It is a source database where replication starts. It makes data available for replication.
- Publishers define what they publish through a publication.
- Articles are the actual database objects included in replication like tables, views, indexes, etc.
- An article can be filtered when sent to the subscriber.
- A group of articles is called publication.
- An article can’t be distributed individually. Hence publication is required.
- It is intermediary between publisher and subscriber.
- It receives published transactions or snapshots and then stores and forwards these publications to the subscriber.
- It has 6 system databases including distribution.
- It is the destination database where replication ends.
- It can subscribe to multiple publications from multiple publishers.
- It can send data back to publisher or publish data to other subscribers.
- It is a request by a subscriber to receive a publication.
- We have two types of subscriptions – push and pull.
- With this subscription, the publisher is responsible for updating all the changes to the subscriber without the subscriber asking those changes.
- Push subscriptions are created at the Publisher server.
Pull Subscriptions –
- With this subscription the subscriber initiates the replication instead of the publisher.
- The subscriptions are created at the Subscriber server.
Detailed Description on Types of Replication
Snapshot replication simply takes a “snapshot” of the data on one server and moves that data to another server (or another database on the same server). After the initial synchronization snapshot, replication can refresh data in published tables periodically—based on the schedule you specify. Although snapshot replication is the easiest type to set up and maintain, it requires copying all data each time a table is refreshed.
Between scheduled refreshes, data on the publisher might be very different from the data on subscriber. In short, snapshot replication isn’t very different from emptying out the destination table(s) and using a DTS package to import data from the source.
Transactional replication involves copying data from the publisher to the subscriber(s) once and then delivering transactions to the subscriber(s) as they occur on the publisher. The initial copy of the data is transported by using the same mechanism as with snapshot replication: SQL Server takes a snapshot of data on the publisher and moves it to the subscriber(s). As database users insert, update, or delete records on the publisher, transactions are forwarded to the subscriber(s).
To make sure that SQL Server synchronizes your transactions as quickly as possible, you can make a simple configuration change: Tell it to deliver transactions continuously. Alternatively, you can run synchronization tasks periodically. Transactional replication is most useful in environments that have a dependable dedicated network line between database servers participating in replication. Typically, database servers subscribing to transactional publications do not modify data; they use data strictly for read-only purposes. However, SQL Server does support transactional replication that allows data changes on subscribers as well.
Merge replication combines data from multiple sources into a single central database. Much like transactional replication, merge replication uses initial synchronization by taking the snapshot of data on the publisher and moving it to subscribers. Unlike transactional replication, merge replication allows changes of the same data on publishers and subscribers, even when subscribers are not connected to the network. When subscribers connect to the network, replication will detect and combine changes from all subscribers and change data on the publisher accordingly. Merge replication is useful when you have a need to modify data on remote computers and when subscribers are not guaranteed to have a continuous connection to the network.
Replication process works in the background with the help of jobs.
These jobs are also called as agents. These jobs internally uses respective .exe files present in …………….. 110COM folder.
All the agents’ information is present in Distribution db in the following tables.
- It is an executable file that prepares snapshot files containing schema and data of published tables and db objects.
- It stores the files in the snapshot folder, and records synchronization jobs in the distribution database.
- It is used with snapshot and transactional replication.
- It applies the initial snapshot to the Subscriber and moves transactions held in the Distribution db to Subscribers.
- It runs at either the Distributor for push subscriptions or at the Subscriber for pull subscriptions.
Log Reader Agent
- It is used with transactional replication, which moves transactions marked for replication from the transaction log on the publisher to the distribution db.
- Each db has its own Log Reader Agent that runs on the Distributor and connects to the Publisher.
- It is used with merge replication.
- It applies the initial snapshot to the Subscriber and moves incremental data changes that occur.
- Each merge subscription has its own Merge Agent that connects to both the Publisher and the Subscriber and updates both.
- It captures changes using triggers.
Queue Reader Agent
- It is used with transactional replication with the queued updating option.
- It runs at the Distributor and moves changes made at the Subscriber back to the Publisher.
- Unlike Distribution Agent and Merge Agent, only one instance of the Queue Reader Agent exists to service all Publishers and publications for a given distribution db.
We will continue on same topic in my next post.