Forum Discussion
Future Support for Configurable Sampling in Purview Classification
We have a question regarding the sampling method used in Microsoft Purview for classification.
Based on the documentation, we understand that for tabular data sources (e.g., SQL databases), Purview samples only the top 128 rows for classification.
However, our client has tables with millions of rows, and this small sample size may not be representative of the actual data. Is there any plan to allow users to configure the number of sampled rows in future updates? This would greatly improve classification accuracy for large datasets.
Thanks in advance for your insights!
Thanks for reaching out. Indeed the current limit is 128 as per documentation. This is a great feature request and you make a good case for this customer’s scenario. I have checked the existing feature requests and I can see there is an existing one logged here- Increase the sample size for prerequisite condition to select column for classification · Community
Please upvote on this feedback for better prioritization of the request.
1 Reply
- milgo
Microsoft
Thanks for reaching out. Indeed the current limit is 128 as per documentation. This is a great feature request and you make a good case for this customer’s scenario. I have checked the existing feature requests and I can see there is an existing one logged here- Increase the sample size for prerequisite condition to select column for classification · Community
Please upvote on this feedback for better prioritization of the request.