First published on MSDN on Nov 11, 2011
Keith and I continue to field the question as to why the Post-Load Data Cleanup appears to take a long time (hours) and can cause SQL Server to use large amounts of CPU.
Notes from Keith:
"What that step does it try to correlate stmt-level events with the batch in which they ran, and show plans with the statement. If you capture starting events then all of this can be done at import time (not via the query) because ReadTrace caches the previous batch/rpc starting event and previous sp:stmt starting event and uses those sequence numbers to fill in on the values on the completed events/showplan events during the import itself.
I’ve tried to optimize this post load import query about half a dozen times now, and invariably it gets better for certain types of scenarios and worse for others. The best solution is to capture the starting events and avoid ragged trace starting point.
The other option is to use the ReadTrace command line parameter
which skips ALL of that processing (including things like indexing), but some of it could be done by manually calling a subset of the same procedures that ReadTrace uses in this step, skipping the one to the Postload fixups.
Even then, some of the reporting features may not work correctly because the event association would not be set correctly.