CDC: how to read history records of master->detail tables?

Copper Contributor

I have 2 tables T1(master), T2(detail). CDC works on these both tables.
I'd like to restore history tables to th new 2 tables T11->T21 using stored procedure.
The question is: how to process the history tables correctly?
If I get and process all T1 history records and after it all T2 there is can be referential integrity conflict.
For example: my SP processes all master rows: D, I, D, I and then details from the start to end of the history.
What is the best way?

4 Replies

Hi @Oleg2604,

 

If you're getting referential integrity errors then you're trying to add detail records that have no associated master record.  I would guess that your CDC data tables either do not contain all of the data you require to rebuild the tables due to: -

  1. Not all master records exist with the CDC table.
  2. The records have been removed due to the cleanup job which by default removes anything over 3 days old.

I would write a simply query to prove this point against the source CDC tables e.g.

 

SELECT T2.* FROM T2

LEFT JOIN T1 ON T1.PK=T2.FK

WHERE T1.PK IS NULL;

 

PK = Primary Key

FK = Foreign Key

 

Obviously this only confirms the error you are getting and where you go from here is purely dependent on your use case and what you are trying to achieve.

 

Regards,

Andy

The CDC history tables master and detail contains all records but the question is how to apply them to the target table? for example: first of all I read N history master records and apply them, hey contain I, D, I, D, after it I read all history detail records. Note: some of master records already deleted from the target table and when I apply history detail records from the begin I receive referential integrity errors: I replay detail history and add detail rows but master row already does not exist. Because I have applied full master history. That is the question.
How to apply master+ detail history: master row by row and for nth row apply corresponding detail subset? How to define what is the detail subset for the master row? May be there is another way?

Hi @Oleg2604,

 

OK I'll summarise what we do for our use case.

 

We firstly extract a list of all CDC enabled tables with the following query: -

SELECT
	T.name AS TableName,
	C.name AS ColumnName
FROM sys.tables AS T
INNER JOIN sys.indexes AS I ON I.object_id = T.object_id
INNER JOIN sys.index_columns AS IC ON IC.object_id=I.object_id AND IC.index_id = I.index_id
INNER JOIN sys.columns AS C ON C.object_id=IC.object_id AND C.column_id = IC.column_id
WHERE
    I.is_primary_key = 1 AND
    T.is_tracked_by_cdc = 1
ORDER BY
    T.name;

 

Next we extract some control variables with the following query: -

SELECT IsProcessing, ProcessingStarted, ProcessingCompleted, LastLsn, LastSequence, LastCount FROM __ChangeDataCaptureState;

 

We then create a result set using a union of all of the tables returned by the above query ensuring that each select has the following WHERE clause with last_lsn = LastLsn and last_sequence=LastSequence extract by previous query .

WHERE __$start_lsn >= @last_lsn AND (__$start_lsn <> @last_lsn OR (__$seqval > @last_sequence))

 

The unioned queries should then have the following order by clause to ensure that all of the transactions are returned in the sequence they were processed.

ORDER BY __$start_lsn, __$seqval;

 

I asked Microsoft the question about the ordering to ensure it would yield the results in the exact order they were processed as there was conflicting information online that contradicted itself. Unfortunately Microsoft couldn't answer that question however all of the tests we have conducted have proved to by work as expected and the solution has been in place for over 8 months processing 100,000's of records without issue.

 

I hope that provides some help!

Hi @AndyHB70,
thank you very much for the answer.
I also cannot find an answer on my question.
I'll try to apply your method.
O.