Forum Discussion

Shane_Lambert's avatar
Shane_Lambert
Copper Contributor
Aug 10, 2023

Files getting classified in syntex training files but not in SharePoint library

Hi,

 

I'm having some trouble with some files in SharePoint not getting classified, with some appearing to not even get put through the classification process. The files are being scanned in batches, uploaded, run through Adobe's OCR, and then classified. However, random files will just never get classified, because of one of two reasons: they never get classified (aka they have no classification date whatsoever), or they get "classified" but remain as the default content type ('Document'). In either case, if I then copy the file into the training files for the model that should be classifying it (there are multiple models running on the same library in our case), the model will perfectly classify it as positive. I can then save/sync the model and reupload the file, and it will still present the same problem. Does anyone know what may be causing this?

 

To be clear, I have had this happen across multiple files, which are being uploaded in batches, with 80-90% of the files in a batch being classified just fine but some just refusing to do so. I can have file 1, 2, and 3 work fine in a batch but suddenly file 4 will fail. There is no difference between the files in terms of scanning, OCR, or the upload process, so I have tentatively ruled out those being the issue. I'm assuming it's a Syntex issue, at least for the second category because those files are getting run through it, just not being classified properly. Any help would be much appreciated!

  • LarsG_1's avatar
    LarsG_1
    Copper Contributor

    Hi Shane_Lambert,
    have you found a solution to the problem? I am currently facing a similar issue.
    One Sharepoint Document Library. Two Classification models, which classify incoming documents into two groups. However, sometimes the classification happens with a large delay (several days) and sometimes no classification happens at all.

    Unfortunately, diasbling both models from the library and enabling again did not help.

    2 Weeks ago everything went flawlessly.

    Any advice or help is highly appreciated.

    • Shane_Lambert's avatar
      Shane_Lambert
      Copper Contributor

      LarsG_1 

       

      After running into this issue hundreds of times while working with syntex, I've found that syntex models seem to be running their own OCR on files when you train them, which seems to be a lot better than other OCRs that I've tried with my current project. However, as far as I can tell, models don't run their own OCR when classifying files in SharePoint libraries. This leads to models classifying files differently when training vs in a SharePoint library. 

       

      That being said, I have no clue how to fix this. I would love to have syntex just apply its own OCR on all files, regardless of location, but I haven't found a good way to do that. At this point, I would also take it the other way around (having syntex not apply any OCR and just use ours) but I also haven't found a way to do that. 

    • Mario_Fulan's avatar
      Mario_Fulan
      Iron Contributor
      I've had similar issues and not sure the cause. You've tried all the tricks I tried and then some including republishing the models. Some files just seem to refuse to be classified even though the models identify them flawlessly.
      • LarsG_1's avatar
        LarsG_1
        Copper Contributor

        Thank you for the quick reply! That is really unfortunate...
        We get around this issue for now, by adding another "custom" column to classify the document type by hand. And adding a rule that if there is a classification date, override the "custom" column. Otherwise, it can still be classified by hand.
        However in my opinion this issue is a deal-breaker in using syntex as of now.

Resources