Terry_Hebert - apologies for length of time to answer your 6/20 question and thanks for the reminders 😉
Rephrased: "How do we address the DFARs 7012(e) requirement regarding image retention and packet capture?"
This was initially an area of concern for exactly the point you brought up. Cloud data architecture for hyperscale providers typically cannot fulfill this requirement _as written_ in any commercially feasible manner. That said; the objective of the requirement can be met. This has often been a 'fundamentals' topic of discussion as it supports many other discussions than just this DFARs rule. So let's walk through the details for a deeper understanding to enable y'all to have this discussion and understand both why & how this is achieved differently than written.
Data Architecture Challenge: Cloud data architecture often involved various forms of 'chunking' data into small portions and distributing it across a wide array of infrastructure. This actually results in improved performance and resiliency. It has an added benefit of providing another layer of abstraction against attackers due to the data 'chunks' often being encrypted and therefore requiring decryption and indexing for any meaningful reconstruction. An unintended benefit is that now a single blob object such as a file of interest from a CUI/DFARs/CDI perspective may be stored across dozens or hundreds of physical devices depending on the size of the document. Multiply this by the number of documents possibly impacted in the investigation of an incident and it can result in thousands of physical &/or virtual devices being 'in scope' in the literal translation of this requirement. So let's keep that in mind as we discuss the next point.
Forensic/Investigation scope of analysis Challenge: Another 'fundamentals' issue is clear understanding of the difference between the scope of responsibility in the investigation of an incident from the tenant vs service provider perspective. Pulling back a bit the first important concept to embrace when assessing scope of responsibility is to understand scope of management responsibility (think section 9 of a SSP where the accreditation boundary is defined by the entity). This understanding allows a bright line to be accepted between the scope of tenant management of the service versus service provider management of underlying systems and infrastructure. Applied to investigation this naturally progresses to where the tenant is responsible to analyze all application level log data regarding configuration and use and the service provider would analyze all system level data regarding management of the underlying infrastructure. Each party defines the context surrounding data lending itself into comprehending what an anomaly or deviation is in the context of their management of their scope of responsibility. This is important to embrace because it means that when a tenant experiences an event or incident and conducts investigation that they understand they are equipped to perform this independent of the service provider i.e. the tenant owns the data necessary to identify anomalistic, unauthorized or otherwise concerning behavior from application telemetry they should be analyzing under their scope of control responsibility already. This is also true of the service provider. And both parties possess a unique context of comprehension that cannot be easily shared i.e. even if a service provider shared all the underlying system log data (often inappropriate for multi-tenancy systems) the tenant lacks the architectural understanding of service construction and deployment to accurately assess issues. The same is true for the service provider as they cannot identify what the tenant has (or has not) authorized for appropriate configuration, exception, deviation, muse patterns for groups or specific users, etc. The primary difference being that the likelihood of a tenant incident requiring notification of the service provider is much smaller than the converse. The intent of most service providers is to ensure as much autonomy of the tenant as possible and to commit to notification of the tenants if impacted by an incident in the scope of the service providers responsibility.
So now taking just these two challenges into consideration lets assess the objective of sub paragraph (e). The intent is to ensure defensible data to facilitate accurate and defensible investigation implying that such defensibility can withstand scrutiny if the service provider or tenant needs to provide such data in support of more formal investigations under regulatory requirements. As we see in the first challenge regarding data architecture; there is no commercially feasible method of retaining 'images' of systems as it would require a complete replication of the underlying service to provide the indexing and decryption capabilities for all impacted systems. Furthermore it violates our principles to perform and packet capture of tenant actions. A service provider might do that for the behavior of their own staff in support of the service but it should be generally unaccepted that a service provider would perform packet capture and inspection of tenant traffic (unless of course that's part of a feature i.e. security features designed to analyze traffic &/or message content). Is an image the only defensible method to amass the data necessary to validate what occurred in an incident? No. Our position is that this is exactly why the AU control family exists and that between the scope of service provider and tenant responsibility that each party already has the _forensically relevant_ data necessary to conduct their respective investigations. This position also aligns nicely with the previous scope of responsibility point; because any packet capture or other retention of relevant data should be constrained to the appropriate scope of control of each party to ensure respect of boundaries.
What is "forensically relevant" then?
This term "forensically relevant" is important as it's the term I chose to adopt when constructing our attestation memorandum on DFARs (available for both GCC & GCCH as they rely on the same underlying controls from 800-53). In reading our attestation memorandum it can be noted that we've attempted to be clear that our intent is to support the requirements objective by ensuring retention of all "forensically relevant" information related to an incident. In this way my intent is to demonstrate adherence to the spirit of the requirement rather than the letter as the letter is simply not commercially feasible in most cloud solutions at scale.
What does this mean for you the tenant? Probably three things:
1) You will want to ensure you have a copy of the attestation memorandum for the service you reside in stored along with your other accreditation artifacts; and that you are comfortable being able to speak to (e) or your organization knows that if questions arise about the service providers stance that we remain available to help address them with auditors etc. Note: In my experience many auditors have become much more technically competent on these issues in the past 5-7 years but our industry is not renown for agility when it comes to adaptation of change regarding regulatory issues. So feel free to reach out as needed.
2) You will want to ensure that your IR (Incident Response) processes reflect this understanding i.e. that IR processes regarding cloud service consumption are clear on the scope of data available to the tenants IR team(s) for analysis in contrast to the service providers scope of responsibility. I often recommend this should be rehearsed at minimum *prior* to cloud adoption but annual rehearsal supports control requirements and is a pragmatic (affordable) way to ensure the first time a tenant encounters a major incident that they are not confused about which party owns which data and what scope of investigation and why. Attempting to grasp the impact of these issues the first time in a real world incident tends to be less than ideal.
3) Healthy internal feedback between operations and investigations teams. This last point sounds like common sense but often overlooks implied issues that arise out of discussions like this i.e. the value of rehearsal and operational input on the log data being harvested for analysis. Tenants can often err to either side (too much or too little data retained for too long or not long enough etc.). Healthy internal checkpoints help ensure that the data being harvested from logs remains relevant to ongoing investigations; is retained as needed; &/or is formally modified over time as needed (even if from a lightweight governance process that generates defensible evidence of ongoing management of the data necessary to support investigations).
I know this response was lengthy; but I hope it helped provide deeper context (for those of you that have read the attestation memos) on why the language of subparagraph (e) (or elsewhere) might deviate from the boilerplate type of language many of us grew up with managing our on-prem systems. Thanks again for your patience and polite reminders of your open question Terry. Always appreciate the keen inquiries as they help many others gain insight into the thinking behind 'why things are different'.