Professional article

Document Classification with RAG – Limits and Potentials

In many digital mailrooms, AI systems are supposed to automatically understand and classify documents. The reality is often different: misclassifications, inaccurate data extraction and a high manual control effort slow down the efficiency of the entire process chain. Especially for service providers who process hundreds or thousands of documents every day, this becomes a cost trap.

Key results from the Arcplace study 

Johannes Egli's master's thesis investigates whether automatic document classification can be improved by generative AI models in combination with retrieval-augmented generation (RAG). For this purpose, a prototype was developed and tested with real documents from Arcplace's Digital Mailroom. The basis was the Design Science Research methodology.

 

Key results

  • A standard RAG application with a random knowledge base could not keep up with classic machine learning methods.
  • However, in certain cases – e.g. in the case of very similar documents from different health insurance companies – RAG increased the classification accuracy.
  • RAG's potential lies in the optimization possibilities of individual components and the integration of future technologies.

 

Technical Challenges

  • Knowledge base: must be up-to-date, curated and specifically structured.
  • Infrastructure: requires stability, computing power, and compliance security.
  • Integration: smooth interfaces are crucial for practical use.
  • Transparency: indispensable for detecting quality fluctuations at an early stage.

 

Result

RAG is not a replacement for existing technologies – not yet. But it shows how efficiency in the digital mailroom can be increased in a targeted manner and which development steps will be decisive in the future: from the fine adjustment of individual components to multimodal systems.

Contact us

Do you have any questions about this topic?
Contact us, we will be happy to advise you personally.