GenAI in Customer Service: More Efficient Service Requests with RAG

Want to answer customer inquiries faster and more accurately - without the manual effort? A company-specific AI makes it possible. Retrieval-augmented generation (RAG) enhances responses by combining a trained AI model with real-time company data, ensuring relevant, context-aware answers.

In this article, we’ll explore how we successfully integrated CustomGPT.ai into an authoring company’s service request process. Our goal? To automate request handling, boost efficiency, and enhance customer satisfaction.

Blog

Customer Service

Low-Code

Pega

27.05.2026

Bilal Güclü

Erfahren Sie mehr

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is an innovative method that optimizes the performance of large language models (LLMs). It involves leveraging external knowledge sources—beyond traditional training data—prior to generating a response. This technique expands the capabilities of LLMs by applying them to specialized topics or an organization’s internal data without requiring the model to be retrained. RAG improves the accuracy and relevance of responses by ensuring that information comes from reliable sources. In this way, companies can customize their AI systems cost-effectively while improving the quality of generated responses, without relying on expensive training processes.

Why is RAG important?

LLMs are a key technology in the field of artificial intelligence. However, these models are inherently unpredictable, which is why the answers they generate are often described as “hallucinated.” The model produces answers that may be factually inaccurate or completely fabricated. Furthermore, the training data for LLMs is based on a fixed state of knowledge at a specific point in time, which means that it remains static and does not reflect current developments.

Key challenges:

Generating inaccurate or factually incorrect information when no answers are available.
Generating answers based on outdated data.
Generating answers based on unreliable sources.

This is where RAG comes in, offering an approach to address some of these challenges. RAG performs a retrieval process. In this process, the prompt is used to extract relevant information from the external knowledge base. The input is converted into a vector representation and matched against the vector database. Context-relevant information is extracted based on the prompt. In the next step, the enrichment step, the prompt is enriched with the relevant context. In the generation step, the enriched prompt is passed to the LLM, which uses its generative capabilities to produce a context-relevant response from the provided information. By combining retrieval, enrichment, and generation, RAG ensures that the response is both informative and directly tailored to the query.

Advantages of RAG

More accurate answers
RAG pulls from reliable external sources to generate precise and relevant responses. This reduces the risk of misinformation or "hallucinations" by relying on verified data instead of the model’s internal knowledge alone.
Up-to-date information
With RAG, you always get the latest insights without retraining the model. Instead of constant updates, external knowledge sources can be maintained separately—keeping your information current.
Expanded applications
RAG extends the capabilities of large language models (LLMs), allowing access to specialized expertise or internal knowledge bases without retraining.
Trusted responses
By sourcing information from credible references, RAG increases confidence in its answers - ensuring reliability and accuracy.

What is CustomGPT.ai and how is it used?

CustomGPT.ai is a powerful solution for creating customized language models that are specifically tailored to a company’s requirements and data. As part of our project, we used CustomGPT.ai in combination with a RAG system to improve efficiency in handling service requests. The RAG system ensures that the generated responses are based on current and relevant company data. This guarantees that the suggested responses are accurate and contextually appropriate.

The primary task of CustomGPT.ai in this project was to automatically generate a response suggestion based on incoming customer emails. By implementing this system, the workload of the Customer Service Representatives (CSRs) was significantly reduced. Instead of manually drafting each response, they could now rely on precise, AI-powered suggestions.

Overview of the Service Request Process

The service request process follows a structured workflow consisting of several phases designed to ensure efficient handling of requests.

Categorization: At the outset, the request is assigned to a category by the CSRs. Common categories include “Complaint,” “Consultation,” and “Support,” each of which has specific subcategories that address more detailed inquiries.

Decision table for verifying eligibility for automated response generation based on category: After categorization, a decision table is used to determine whether an interface for automated response generation should be invoked. For categories such as “Consultation,” which contain frequently recurring inquiries, a response suggestion is automatically generated. Other categories such as ‘Complaint’ or “Support” may require manual processing. Based on the incoming email, a response suggestion is created using a RAG system, derived from relevant and up-to-date company data. The system searches the database for information that matches the inquiry and generates a precise and contextually appropriate response to enable fast and efficient processing.

Suggested Responses: The suggested response is displayed to CSRs in the Pega interface. CSRs then have the option to send the response, edit the suggestion, or reject it.

Interaction Processing: When the response is sent, various options for closing the case can be selected. One option is to close the service request after sending the response, so that the process is marked as complete and the request can be forwarded. If multiple requests need to be processed in a single email, it is also possible to create additional service requests.

Flexibility in Processing: If service managers wish to customize the automatically generated response, an editor is available to modify the response to meet the specific needs of the inquiry. If the suggested response is rejected, the service manager must provide a reason and compose their own response. This process ensures efficient handling of service requests by offering both automation and flexibility to address the diverse needs of customers.

Technical implementation of the integration

To seamlessly integrate CustomGPT.ai into customer service, it was connected to Pega via a REST API. In this case, Pega was deployed at the customer’s site to automatically process service requests and generate context-aware response suggestions.

Relevant data—such as the date received, subject line, and email text—is transmitted to CustomGPT.ai via an interface. The response is generated using RAG and fed directly back into the Pega system.

The integration was implemented in stages, beginning with a check to see if an automated response already exists for a request. If not, the relevant information is transmitted to the AI. A decision table determines whether the request is answered automatically based on its category or forwarded to a Customer Service Representative (CSR).

To ensure stability and error handling, a retry logic was integrated: If an API call fails, it is automatically re-executed via a queue processor. In the user interface, CSRs receive a transparent status display as well as the ability to edit, reject, or directly send response suggestions.

Benefits of Implementation

Integrating CustomGPT.ai into the service request process has yielded numerous benefits:

Automation and increased efficiency: Routine inquiries were automated, and CSRs no longer had to spend time drafting standard responses. This not only saved time but also significantly sped up the process.
Error reduction and precision: Thanks to CustomGPT.ai’s tailored model, errors in responding to inquiries were minimized, as the generated responses are always based on current and relevant company data.
Improved customer satisfaction: Faster and more precise responses led to a better customer experience, as wait times were reduced and communication became more efficient.
Scalability: The solution is flexible and scalable, allowing it to be easily adapted to growing requirements and new inquiry categories in the future.

Conclusion

The successful integration of AI tailored to the company’s needs into the publishing company’s service request process demonstrates how targeted automation can make customer service more efficient and responsive.

By combining the generative capabilities of LLMs with the RAG approach, it is possible to ensure that context-aware responses are always generated using up-to-date and relevant company data. This results in precise, context-aware responses that streamline the work of CSRs, increase efficiency, and simultaneously improve the quality of customer communication.

By connecting to Pega, the solution could be seamlessly integrated into existing processes. As part of the platform, it enables the flexible automation of recurring tasks—without removing the human element from the process. This allows service staff to focus their time more efficiently on more complex and personalized inquiries, rather than spending time on standard requests. Faster response times, reduced manual effort, and improved scalability are just a few of the benefits. Most importantly, the technology remains adaptable—it grows with the company’s needs and can be continuously optimized.

With this approach, the publishing company has not only improved its internal processes but also increased customer satisfaction. A good example of how smart digitalization can make everyday life easier for everyone involved—and that’s exactly what matters.