Hacking the RegEx entity within LUIS
Published Jan 10 2020 07:53 PM 2,407 Views

By Jaya Mathew and Mithun Prasad, PhD


In our previous blog, we gave a brief introduction to machine translation, explored various topics like identifying the language and how to perform translation/transliteration of spoken or typed text using Microsoft’s Translator Text API. In addition, we also discussed how translated or transliterated text can be integrated within a LUIS app. In this blog, we highlight new language support coming to LUIS and provide tips on improving app performance when using languages that are in preview phase.


At the time of writing the previous blog, Hindi language support was not natively available in the LUIS portal. However, LUIS portal now supports (in Preview) additional languages where Hindi script is supported. So, the user can create a new app where the culture is set to ‘Hindi Indian (Preview)’ as shown in Figure-1 and then can type in Hindi utterances within their new app.




Figure-1: Creating an app with Hindi language


In the preview phase, however, some of the pre-built entities like URL in native language are not supported, so the user might run into issues when trying to tag URLs as shown in Figure-2: 




Figure-2: URL’s in Hindi native script


One way to work around this is to create a RegEx (Regular Expression) entity manually as shown in Figure-3:




Figure-3: RegEx workaround for pre-built entries


Generic entities such as phone numbers and URLs can be extracted using regular expressions for matching standard patterns. Examples of URLs rendered in native language are as follows:






It is important to note that in order to get the full range of characters, we use \u0900-\u097F and not native characters.











This adds a lot of flexibility when building a LUIS model in languages that are in preview and do not yet support the various prebuilt entities.








Version history
Last update:
‎Jan 10 2020 07:56 PM
Updated by: