First published on TECHNET on Jun 18, 2009
The Content Classifier in the http://blogs.technet.com/filecab/archive/2009/05/11/windows-server-2008-r2-file-classification-infrastructure-managing-data-based-on-business-value.aspx extracts text from files using the IFilter mechanism that enables the Search Indexer. Here is a list of file types that have a corresponding IFilter installed on a Windows Server 2008 R2 install without any other software installed on it:
Additionally, the a few Microsoft IFilters that can easily be added without extra cost
Other free and commercial IFilters also exist. You can start at http://ifilter.org/ to find more IFilters.
However, just because an IFilter exists, does not mean it will extract data from files. Some of these will not retrieve text to be scanned by the content classifier. For a complete list of file types that the default IFilters can extract text from and what data is extracted from each, you can look at the official list of included Filters at http://www.live.com/docs/toolbarts.aspx?t=MSNTbar_CONC_SearchableFileTypes.htm
If you would like to figure out which IFilters are installed on one of your servers, you can use the FiltReg ( http://msdn.microsoft.com/en-us/library/ms692537(VS.85).aspx ) tool.
The official http://blogs.msdn.com/ifilter has a lot more information as well as more links to IFilters to extend the reach of the Content Classifier (and the Search Indexer) into more file types.
Update: Formating issues with table
The Content Classifier in the http://blogs.technet.com/filecab/archive/2009/05/11/windows-server-2008-r2-file-classification-infrastructure-managing-data-based-on-business-value.aspx extracts text from files using the IFilter mechanism that enables the Search Indexer. Here is a list of file types that have a corresponding IFilter installed on a Windows Server 2008 R2 install without any other software installed on it:
| Filter Name | Extension |
| HTML Filter | .ascx .asp .aspx .css .hhc .hta .htm .html .htt .htw .htx .odc .shtm .shtml .sor .srf .stm |
| Microsoft Office Filter | .doc .dot .pot .pps .ppt .xlb .xlc .xls .xlt |
| MIME Filter | .mht .mhtml .p7m |
| Plain text Filter |
.a .ans .asc .asm .asx .bas .bat .bcp .c .cc .cls .cmd .cpp .cs .csa .csv .cxx .dbs .def
.dic .dos .dsp .dsw .ext .faq .fky .h .hpp .hxx .i .ibq .ics .idl .idq .inc .inf .ini .inl .inx .jav .java .js .kci .lgn .lst .m3u .mak .mk .odh .odl .pl .prc .rc .rc .rct .reg .rgs .rul .s .scc .sol .sql .tab .tdl .tlh .tli .trg .txt .udf .udt .usr .vbs .viw .vspscc .vsscc .vssscc .wri .wtx |
| RTF Filter | .rtf |
| Wordpad Filter | .docx .odt |
| XML Filter | .csproj .user .vbproj .vcproj .xml .xsd .xsl .xslt |
Additionally, the a few Microsoft IFilters that can easily be added without extra cost
| Filter Name | Extension | Reference |
| Windows TIFF IFilter | .tif | Server Manager-> Add Feature –> Windows TIFF IFilter |
| Microsoft Filter Pack for Office 2007 | .docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb .vdx, .vsd, .vss, .vst, .vdx, .vsx, .vtx .one .zip | http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en |
| Microsoft Office 2010 Filter Packs | .doc, .ppt, .xls, .xlsm, .xlsb, .docx, .docm, .pptx, .pptm, .xlsx, .zip, .one, .vsd, .vsx, .vss, .vst, .vdx, .vtx, .pub, .odt, .ods, .odp | http://www.microsoft.com/downloads/en/details.aspx?FamilyID=5cd4dcd7-d3e6-4970-875e-aba93459fbee |
Other free and commercial IFilters also exist. You can start at http://ifilter.org/ to find more IFilters.
However, just because an IFilter exists, does not mean it will extract data from files. Some of these will not retrieve text to be scanned by the content classifier. For a complete list of file types that the default IFilters can extract text from and what data is extracted from each, you can look at the official list of included Filters at http://www.live.com/docs/toolbarts.aspx?t=MSNTbar_CONC_SearchableFileTypes.htm
If you would like to figure out which IFilters are installed on one of your servers, you can use the FiltReg ( http://msdn.microsoft.com/en-us/library/ms692537(VS.85).aspx ) tool.
The official http://blogs.msdn.com/ifilter has a lot more information as well as more links to IFilters to extend the reach of the Content Classifier (and the Search Indexer) into more file types.
Update: Formating issues with table
Update 2: Added entry for Office 2010 filter pack
Updated Apr 10, 2019
Version 2.0FileCAB-Team
Iron Contributor
Joined April 10, 2019
Storage at Microsoft
Follow this blog board to get notified when there's new activity