#2 Smart Text Categorization in Businesses – Pega’s 3 Models

Pega offers various text categorization features that enable you to analyze the content of an email and assign it to categories. By using the text categorization feature, you can efficiently analyze large volumes of data and assign them to predefined categories.
Blog
Low-Code
Pega
Erfahren Sie mehr
Person tippt am Laptop.
Content elements

To analyze email content and automatically assign categories, Pega provides various text classification functions. This allows even large volumes of data to be efficiently analyzed and assigned to predefined categories.

There are three different models to choose from.

Sentiment Detection

Intent Detection

Topic Detection

All three models run independently and in parallel, supporting categorization through machine learning. Topic Detection also supports keyword-based text categorization.

In keyword-based text categorization, the text is scanned and searched for topic-specific keywords. Based on the recognized keywords, the categorization assigns the text to a corresponding topic. This categorization is used when the machine learning model has not yet been fully developed and does not deliver satisfactory results.

In text categorization using machine learning, the model learns to categorize the text on its own by analyzing previous text classifications. By classifying the texts, various patterns for topic detection can be identified. To improve the accuracy of topic detection in production environments, feedback can also be provided to the machine learning models. The application of topic detection through  machine learning is particularly useful when there is access to previous customer messages and their corresponding categories, or when relevant training data can be provided to the machine learning model.

 

 

Pega Logo

Sentiment Detection

Sentiment detection involves identifying the characteristics of the text being analyzed. By using machine learning and natural language processing methods, an email bot can detect negative emotions in an email. The analyzed text is then assigned categories such as positive, neutral, or negative. This enables an efficient and timely response to critical issues. 

Intent Detection

The second model is intent detection. This involves identifying the intent of the text being analyzed. The goal is to identify the purpose of the text or the author’s intent. By identifying the intent behind a text unit, this method enables a more precise interpretation of user communication and supports effective responses and actions on the part of the company.

Person tippt am Laptop.

Text Extraction with Pega

Here you can learn how Pega uses text extraction and machine learning to analyze text data and structure it by identifying and categorizing named entities.

Topic Detection

Topic detection involves identifying the overarching topic of a single text unit or an entire document in order to efficiently process an incoming customer inquiry and take appropriate action. For example, inquiries regarding support or service can be identified, and appropriate action can be taken. This enables improved service quality and seamless customer interaction.

In Topic Detection, there are three algorithms to choose from that can be used when building a model. By default, the model is built using all algorithms, but after building, you can select a specific algorithm, ideally based on the highest F-score. (The F-score is a weighted metric that indicates how well a model performs.)

Maximum Entropy

The Maximum Entropy (MaxEnt) model is based on the principle of maximum entropy and enables the estimation of probabilities based on given constraints. It optimizes conditional entropy to enable robust and versatile predictions. The model uses feature functions weighted by a Lagrange multiplier.

Naive Bayes

Naive Bayes is a probabilistic model based on Bayes' theorem. It assumes that all variables are independent of one another. The algorithm is efficient during training and uses the prior probability as well as the probability of words appearing in an email to calculate the probability of a specific category.

Support Vector Machine (SVM)

SVM is a linear classifier that seeks a hyperplane to optimally separate data points. It can also handle nonlinear decision boundaries through the use of kernels. Multi-class SVM can be extended using the “one-vs-rest” or “one-against-one” approaches, with the choice depending on the dataset and specific requirements.

The choice of the best algorithm should be influenced by the problem’s requirements, the size of the dataset, and the desired classification accuracy. Each algorithm has its strengths and weaknesses, and careful consideration of these factors is crucial for selecting the optimal approach.

Training

To obtain accurate predictions from the machine learning models in Pega, it is crucial to carefully prepare the training data. For the topic detection models, CSV, XLS, or XLSX file formats are used, which must meet specific criteria.

The topic detection model requires a file with three columns: “Content,” “Result,” and “Type.” The “Content” column contains the email data, while the ‘Result’ column specifies the desired result or topic. In this case, the topic begins with the word “Action,” followed by the detail category, which is denoted by a hyphen instead of an underscore. The “Type” column indicates whether the data is training or test data.

ContentResultType
[E-Mail or text]Action > [DetailCategory] 
 
  •  
  •  
  •  
 

The fine details of training and model selection

When training the models, you can choose whether to overwrite or supplement existing data. It is possible to integrate data from various sources, such as information provided by the channel. The ratio of training and test data can also be specified; by default, 70% of the data is used for training and 30% for testing.

For the topic model, there are three different algorithms to choose from: Maximum Entropy, Naive Bayes, and Support Vector Machine. All three models can be created simultaneously, and the selection is based on the highest F-score, which represents the model’s performance.

The precise structuring and preparation of the training data plays a crucial role in the success of machine learning models in Pega. By taking into account the specific requirements of each model, optimal performance and prediction accuracy are ensured.

Share

More interesting articles