Forum Discussion

sunderb's avatar
sunderb
Copper Contributor
Oct 22, 2024

Data Archival in Merge replication

We are working on purging data in 3.5 TB database which is acting as a publisher in Merge replication with 2 subscribers which are hosted on EC2 machines in Europe and US regions respectively. We are looking for a Data Archival Strategy to optimize the data sync between publisher and subscribers in the best possible way as the volume of records that are going to be purged in publisher will be in millions.

 

This huge data purge in publisher will choke the subscribers and might delay the data sync from days to weeks. Hence we are looking for some recommendations on the archival strategy.

  • Mks_1973's avatar
    Mks_1973
    Iron Contributor

    First Assess Data to Purge:
    Identify the tables and records to archive based on business rules, such as dates or other criteria. Ensure you’re only purging data that doesn’t need to be replicated.
    Instead of deleting millions of records at once, batch the purge process. This minimizes the load on the replication process and reduces potential lock contention
    Create appropriate indexes on the columns used for identifying data to purge to optimize query performance

    If possible, consider moving archived data to tables that are outside of the replication scope, especially if the data no longer needs to be synchronized across servers
    If the tables are partitioned (for example, by date), you can easily switch entire partitions to a separate archival database. This will allow you to archive or purge large data blocks without deleting individual rows, which is more efficient.
    Move historical data to a separate archival database. You can use techniques like ETL (Extract, Transform, Load) or BCP (Bulk Copy Program) to transfer data to the archival database.

    Schedule the purge operations during off-peak hours to minimize impact on active replication.
    Implement pauses between purges or use smaller batches, such as purging in increments of a few thousand rows per minute, to avoid overwhelming replication
    Regularly monitor the Merge Replication latency during purging and adjust batch sizes if necessary.

    For massive data purges, consider reinitializing the subscription after the purge. This allows subscribers to synchronize with the publisher from a clean state

    Implement row-level filters in replication articles to replicate only the essential data, reducing the overall data volume transferred.

    Increase the values for -UploadGenerationsPerBatch and -DownloadGenerationsPerBatch in the Merge Agent profile. (These parameters control the number of changes processed per batch, which can help to sync large volumes of data faster)


    Snapshot Replication as an Alternative
    If the subscribers are primarily read-only or don’t need to synchronize with the publisher in near-real-time, consider switching from Merge Replication to Snapshot Replication, Snapshot Replication is more bandwidth-intensive, it may be more efficient for subscribers that don’t need immediate updates.

    If feasible, define logical partitions so that older data doesn’t synchronize to subscribers. This way, archival data won’t sync to subscribers at all, keeping only relevant, current data in sync.



    Sample Archival Workflow Using T-SQL

    DECLARE @BatchSize INT = 1000
    DECLARE @RowsDeleted INT

    SET @RowsDeleted = 1

    WHILE @RowsDeleted > 0
    BEGIN
        BEGIN TRANSACTION

        DELETE TOP (@BatchSize) FROM YourTable
        WHERE YourConditionForArchival -- e.g., Date < '2020-01-01'

        SET @RowsDeleted = @@ROWCOUNT

        COMMIT TRANSACTION

        -- Wait between batches
        WAITFOR DELAY '00:00:10' -- 10 seconds delay (adjust as necessary)
    END

    Batch the Purge Process and execute it during off-peak hours with monitoring in place.
    Archive Data Outside Replication Scope to prevent it from affecting Merge Replication performance.
    Consider Partition Switching if your database tables are partitioned for efficient large data moves.
    Reinitialize Replication if necessary, to bring subscribers back in sync after a massive purge.
    Tune Replication Settings such as batch sizes for Merge Agent to help handle larger data volumes.






Resources