SOLVED

DeDuplication in Content Search

%3CLINGO-SUB%20id%3D%22lingo-sub-24892%22%20slang%3D%22en-US%22%3EDeDuplication%20in%20Content%20Search%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-24892%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20having%20to%20do%20more%20ediscovery%20and%20in%20my%20latest%20I%20wanted%20to%20ensure%20that%20email%20was%20deduplicated%20before%20exporting%20it%20to%20a%20pst.%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20first%20step%20(after%20getting%20the%20query%20built%20and%20run)%20is%20to%20generate%20the%20Report.%20Selecting%20the%20Deduplication%20box%20led%20me%20to%20believe%20that%20the%20process%20would%20do%20exactly%20that...dedupe.%20When%20the%20report%20process%20was%20complete%2C%20the%20report%20shows%20the%20number%20of%20items%20and%20size%20of%20the%20extract...and%20it%20is%20the%20same%20for%20the%20orginal%20query%20and%20after%20deduplication.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESo%20then%20I%20went%20ahead%20and%20generated%20an%20export%20(again%20selecting%20deduplication).%20It%20created%20my%20pst%20files.%20After%20bringing%20them%20into%20Outlook%2C%20a%20very%20simple%20look%20in%20the%20first%20100%20emails....I%20found%2018%20duplicates.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20there%20something%20missing%20in%20the%20process%20to%20run%20reports%20and%20exports%20to%20eliminate%20duplicate%20messages%20)or%20any%20object)%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Ewhen%20you%20are%20dealing%20with%20250%2C000%20emails%2C%2020%25%20is%20a%20large%20number%20that%20I%20dont%20have%20to%20look%20at.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-320358%22%20slang%3D%22en-US%22%3ERe%3A%20DeDuplication%20in%20Content%20Search%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-320358%22%20slang%3D%22en-US%22%3E%3CP%3EHey%2C%20did%20you%20ever%20get%20a%20response%20from%20Microsoft%20on%20this%20one%3F%3C%2FP%3E%3CBLOCKQUOTE%3E%3CHR%20%2F%3E%3C%2FBLOCKQUOTE%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-25173%22%20slang%3D%22en-US%22%3ERe%3A%20DeDuplication%20in%20Content%20Search%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-25173%22%20slang%3D%22en-US%22%3E%3CP%3EWell%20I%20gave%20it%20a%20quick%20try%20and%20I%20seem%20to%20be%20able%20to%20confirm%20similar%20behavior.%20Need%20to%20take%20a%20more%20in-depth%20look%20to%20be%20sure%20though%2C%20but%20I'd%20certainly%20recommend%20opening%20a%20support%20ticket%20for%20this.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-25066%22%20slang%3D%22en-US%22%3ERe%3A%20DeDuplication%20in%20Content%20Search%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-25066%22%20slang%3D%22en-US%22%3E%3CP%3EVasil.%20I%20have%20done%20this%20against%20a%20single%20mailbox%20and%20multiple%20mailboxes.%20In%20the%20report%20generation%2C%20you%20are%20given%20the%20option%20to%20deduplicate%2C%20I%20select%20this%20option.%20The%20report%20is%20generated%20and%20there%20are%20two%20lines%20in%20the%20report%3B%20mailbox%20object%20count%20and%20size%2C%20mailbox%20object%20count%20and%20size%20after%20deduplication.%20Every%20report%20I%20run%2C%20these%20two%20rows%20are%20always%20have%20the%20same%20results.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhen%20exporting%2C%20again%20using%20the%20same%20deduplication%20option%2C%20I%20open%20the%20PST%20and%20perform%20a%20simple%20check%20for%20duplicates%20and%20I%20always%20find%20duplicate%20emails%20and%20calendar%20objects.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Eusing%20a%20third%20part%20tool%2C%20I%20typically%20get%20a%2015-25%25%20reduction%20of%20objects%20after%20running%20an%20export%20that%20has%20been%20deduped.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-24932%22%20slang%3D%22en-US%22%3ERe%3A%20DeDuplication%20in%20Content%20Search%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-24932%22%20slang%3D%22en-US%22%3E%3CP%3EDuplicates%20should%20still%20be%20listed%20in%20the%20report%20file.%20Are%20you%20running%20the%20query%20against%20a%20single%20mailbox%20or%20multiple%20ones%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E
Contributor

I'm having to do more ediscovery and in my latest I wanted to ensure that email was deduplicated before exporting it to a pst. 

The first step (after getting the query built and run) is to generate the Report. Selecting the Deduplication box led me to believe that the process would do exactly that...dedupe. When the report process was complete, the report shows the number of items and size of the extract...and it is the same for the orginal query and after deduplication.

 

So then I went ahead and generated an export (again selecting deduplication). It created my pst files. After bringing them into Outlook, a very simple look in the first 100 emails....I found 18 duplicates.

 

Is there something missing in the process to run reports and exports to eliminate duplicate messages )or any object)?

 

when you are dealing with 250,000 emails, 20% is a large number that I dont have to look at.

4 Replies

Duplicates should still be listed in the report file. Are you running the query against a single mailbox or multiple ones?

Vasil. I have done this against a single mailbox and multiple mailboxes. In the report generation, you are given the option to deduplicate, I select this option. The report is generated and there are two lines in the report; mailbox object count and size, mailbox object count and size after deduplication. Every report I run, these two rows are always have the same results. 

 

When exporting, again using the same deduplication option, I open the PST and perform a simple check for duplicates and I always find duplicate emails and calendar objects.

 

using a third part tool, I typically get a 15-25% reduction of objects after running an export that has been deduped.

Best Response confirmed by Bill Diekmann (Contributor)
Solution

Well I gave it a quick try and I seem to be able to confirm similar behavior. Need to take a more in-depth look to be sure though, but I'd certainly recommend opening a support ticket for this.

Hey, did you ever get a response from Microsoft on this one?