Build Your Own Text Classification Model

Former Employee

Mar 24, 2019

First published on MSDN on May 24, 2018
In a support center, text needs to be quickly routed to the correct person. Chatbots need to turn text into different scenarios to be addressed. Sentiment analysis determines whether text is positive, negative, or neutral. In all three of these examples, machine learning models can help.

The Text Classification solution on the Azure AI Gallery solves these multi-class text classification problems using SQL Server ML Services. Both R and Python solutions are included. The 20 newsgroups dataset is used (with some modification) to demonstrate the model building process, which can easily be generalized to other problems such as support ticket classification, chatbot, or sentiment analysis data.

The data schema for training the model is simple: items with one or more pieces of text (for example, a title and longer description) and the label representing the desired classification. This solution uses the Multiclass Logistic Regression model in the MicrosoftML package to train the model. The trained model is evaluated and then deployed to SQL Stored procedures. It is straightforward to swap in your own data with the same schema and rerun the code in the solution to build your own customized model.

Once you've built your model, open the provided Power BI file to examine the results of your model on your test data and your new unclassified items. Use the Power BI Refresh button in the toolbar to replace the example with your own data and model. (If you use a different database name or table names, first go to the PowerBI Data Source Settings and/or the Edit Queries section to supply the correct names, then use Refresh.)