During the initial planning phases of R2, we had the opportunity to interview several IT professionals about the pain points of managing data on file servers. We tried to understand how we can improve the file server data management and what success looks like for these IT professionals.
Very early in the discussion it became apparent that there are key problems that we should try to help find solutions for:
· High growth of data results in higher spending on storage and data management
· New regulations impose a need to better handle sensitive information such as personal information and financial documents
· Leakage of both business critical and personal information is a big problem
We then looked at the existing solutions that are available in the market and indeed there are great solutions for security (e.g.: Data Leakage Prevention …) and data management (e.g.: Backup, Archival, HSM …) but these solutions do not interoperate and mostly work based on where the file is located (folder) and not based on what the file business value is for the organization.
A folder-based approach makes this a harder problem for the human beings who need to figure out where to store their data based on complex company policies. “Does high business impact data with personally identifiable information go here? Or there?” Not to mention the challenges around dealing with documents that don’t end up in the right folder.
What we heard from our customers is that they would like to
gain insight into their data
so that they can
manage data more effectively, reduce cost and mitigate risk
This realization has led us down the path of creating the File Classification Infrastructure that enables organizations to classify their files (assign properties to files) and then use Windows mechanisms as well as partner solutions to apply actions to files based on the file classification.
The File Classification Infrastructure includes the ability to define classification properties, automatically classify files based on location and content, apply file management tasks such as file expiration and custom commands based on classification and produce reports that show the distribution of a classification property on the file server.
In addition to the functionality delivered in Windows we also aimed at building an extensible infrastructure in order to
help provide integration points for different partner offerings
by enabling classification solutions to plug into Windows to classify files and persisting the file classification so that data management products can query the file classification to apply appropriate policy/action. For example if a data leakage prevention product classifies files as containing personal information then a backup product can back it up to an encrypted store instead of the regular store.
Using this paradigm, IT organizations can now define policy that spans across the organization and can better
translate business requirements to IT actions
. For example: The organization might have a policy to expire files that are 10 years old and are not critical to the business. This policy can be translated to use the new file management tasks to expire files across file servers. Furthermore, when new data directories are added, there is no need to change the file management tasks since the action is taken as per the business criticality of the files regardless of their location.
I would like to briefly touch on classification. Many people I talk to raise their eyebrows when I start discussing this subject. I tend to agree with them, classification is hard to determine what organization wide properties to assign to files and it is also hard to actually classify files.
The process that seems to work for determining the organization properties is to have a discussion that includes both the business and IT people and determine how they would like to manage their data and what classification properties should be assigned to files in order to easily manage them. What I found is that this usually amounts to just a few properties such as a mix of the below:
· Personal information (yes/no)
· Business criticality
· Retention period
Now that you defined what properties should be assigned to files, comes the next challenge: actually classifying files. There is no magic formula here but the File Classification Infrastructure really helps you get a long way to achieve this with automatic classification rules to classify the large amount of files residing on your file servers as well as an extensibility mechanism that allows plug-ins and last but not least, the ability to recognize manual classification of Office files. The various classification methods that we observed across the IT organizations we were working with include:
· Manual classification
· Line Of Business application classification (e.g.: When an HR application saves a file to the file server, it can also set the “Personal Information” property to “yes”)
· Automatic classification based on
· Location of files
· File owner
· File content
· Other (e.g.: file size, file extension …)
All these methods might be used to classify files and the File Classification Infrastructure extensibility supports multiple classification mechanisms that can run in tandem to determine the file classification.
With classification in place, data management scenarios light up and become easier to accomplish – here are a few examples of scenarios that can be automated using the Windows Server 2008 R2 inbox functionality with no additional code and scenarios that can be enabled by writing IT PowerShell scripts or using partner solutions that leverage the File Classification Infrastructure APIs.
These additional blogs provide deep dives into how to leverage File Classification Infrastructure (FCI) in your IT environment and how to develop solutions to further plug-in and enhance FCI: