Taxonomy terms , synonyms and explanantions in Syntex Extractor

Brass Contributor
We are working with this document understanding feature: 
We had to create Taxonymy term groups and terms inside the ContentCentre (becasuse we are NOT sharepoint admins and don''t have (nor want) access to the global termstore (yet) )
Expected behaviour:
1 we create an extractor, map it to the newly created Managed Metadata column  --> check
2 we label the sample documents --> check
3 we add an explanation ?? hmm question here, do we need to duplicate the terms\phrases here that we already have in the taxonomy ? We add a few here
4 Train the model --> check
5 Test the model : the synonyms in the termstore are not recognized
6 When we go back to the term store, new terms (the items in the explanation) have been added as new terms ?!
We expected the synonyms to be recognized (as in the demo see link above)  and not the creation of new terms.
Although it is nice that from labelling\training our term store can be enriched, we would have expected those new terms to be offered as synonyms (I understand there is a challenge to pick the term for which the labelled token is a synonym)...
Obviously we are new at this, so any tips are welcome here.
We really like this aggregation level, where we can reach a higher abstraction level of information extraction than just the 'covered text'.
5 Replies
best response confirmed by mpjjonker (Brass Contributor)
You had at least 2 questions:
1. Do we have to repeat the terms in our explanations? YES, for now. I could see this be a feature some day where the model automatically handles this but not today
2. Terms are added instead of use preferred term and synonyms is this expected? If I understand your points then YES. It should match the term or any synonyms but won't create synonyms automatically. If you don't want new terms created you have to have a CLOSED termset or add the new names as synonyms manually. Modify your termset to be closed and new ones won't be added but synonyms will be used as in the sample. See the NOTE at the bottom of the tech doc
If the term set is open, then any extracted values that do not match a preferred term or synonym value will be added as a new term to the root of the term set. These new terms can be moved, merged, or made synonyms in the term store where the term set resides.
@Mario Fulan thanks for the reply.
The answer for question 1 is clear.
For question 2 I am pretty sure that we have synonyms populated with the same values as we have in the explanations and also in the content. Do I understand correctly that the behavior is different for a CLOSED termset than for an OPEN termset ?
CLOSED termset: try to match a synonym
OPEN termset: try to match a term, if no match create a new term (don't look at the synonyms) ?
If you have matching terms in the Synonyms then it should work for open or closed. I do recall having an issue in some of my testing but it has been a while. The difference for closed and open is whether it creates new terms for non-matching items. If you believe that you have synonyms for all the terms then something is not working as expected. I think I had an issue with mixed case and it had to match exactly but don't quote me on that one as my memory may be failing me.
Thanks Mario, when I have another moment I will try to reproduce this and report back.
In case you missed it, a new feature was announced for Syntex at MS Ignite last week that now directly applies terms from a termset to the appropriate model if the MMS column is used in the content type. This should be a huge help for your issue. They didn't give specific dates for availability of this feature, but it is on the roadmap.