The NLP Recipes Team
Natural Language Processing (NLP) systems are used to ease the interactions between computers and humans using natural language. It is used in a variety of scenarios and industries from personal assistants like Cortana, to language translation applications, to call centers responding to specific users’ requests. In recent years, NLP has seen significant growth both in terms of quality and usability. Through new deep learning methods and state-of-the-art (SOTA) Deep Neural Network (DNN) algorithms, businesses are able to adopt Artificial Intelligence solutions to meet their customer’s needs. Unfortunately, finding the correct algorithm to use in different scenarios and languages remains a challenge. To help researchers and data scientists find the best fit for the problem at hand, Microsoft is open-sourcing the Microsoft NLP Recipes repository containing best practices in building and evaluating NLP systems across multiple tasks and languages.
Specifically, our goals are to provide information for anyone who wants to:
Several models have emerged over the years within the NLP community pushing towards neural network architectures for language modeling over more traditional approaches such as conditional random fields (CRFs) and Hidden Markov Models (HMMs). Since 2017, “Transformer” based neural network architectures, such as BERT, GPT-2, ELMo, XLNet, and RoBERTa, have developed as a dominant choice within the NLP community. These architectures dominate multi-task benchmarks such as GLUE as well as single task benchmarks (e.g. text classification and named entity recognition) as they allow leveraging pre-trained language models and adapting them to different downstream tasks. In addition, these pre-trained models are available with support for 100+ languages out of the box. The following table includes the current implementations of models within the repository, across different tasks and languages.
Category | Applications | Methods | Languages |
---|---|---|---|
Text Classification | Topic Classification | BERT, XLNet, RoBERTa, DistilBERT | en, hi, ar |
Named Entity Recognition | Wikipedia NER | BERT | en |
Entailment | MultiNLI Natural Language Inference | BERT, XLNet | en |
Question Answering | SQuAD | BiDAF, BERT, XLNet, DistilBERT | en |
Sentence Similarity | STS Benchmark | BERT, GenSen | en |
Embeddings | Custom Embeddings Training | Word2Vec, fastText, GloVe | en |
Annotation | Text Annotation | Doccano | en |
Model Explainability | DNN Layer Explanation | DUUDNM (Guan et al.) | en |
The examples and utilities of the Microsoft NLP Recipes repository are focused with the following goals in mind to address these issues:
The NLP repository is meant to be accessible to anyone interested in building NLP solutions easily across a wide range of tasks and languages. Contributions from the community are always welcome to keep up to date with the latest state-of-the-art methods.
Utilize the GitHub repository for your own NLP systems
Try out an example of Text Classification using Transformer models
Try out an example of Question answering using BiDAF and Azure Machine Learning
Learn more about the Azure Machine Learning service
Get started with a free trial of Azure Machine Learning service
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.