Machine Learning & AI

The power of deep learning and artificial intelligence excites companies all over the world. Our AI & machine learning experts help to evaluate the potential of intelligent and algorithm based platforms. We help you to design digital applications and services in line with the most recent trends in data handling, language processing, recognition and much more.

Project Case
Invoice Processing

Extracting custom structured information from images or pdf-documents

More and more unstructured or only partly structured text data is produced every day. Receipts, forms, descriptions, contracts, order requests or technical documents are only a few examples. The desired information often has to be extracted manually, which implies the need of a lot of time and human resources. The value of the information that is hidden in deposited documents is often underestimated. As a consequence, a lot of valuable information is not integrated in the business flow. Automate the process of entity extraction out of various document types to enhance your business workflow.

Idea

A machine learning model is trained to extract custom named entities from unstructured text data. Entities can be for example names, dates, numbers, descriptions, prices etc.
We extract the information that is needed from unstructured text using AI.

We automatically extract your custom defined entities based on your domain to shorten waiting times of your customers. 

Solution

3. Named Entity Recognition (NER)

An AI model is trained to extract custom defined entities. A dataset with labeled data has to be created. To do so, the text is extracted via OCR from the training documents. The labelling can then be performed in a tool that was developed by Catalysts in particular for the task of labelling texts and training NER models. With the final dataset, the model can be trained and then used for future predictions.

2. Optical Character Recognition (OCR)

Task of the OCR is to extract the text out of an image. A score is calculated for each word that represents the probability that it is extracted correctly. Additionally, a handwriting detection is applied to find documents with additional notes on them. Handwritten numbers in predefined fields are detected and recognized.

1. Preprocessing

The preprocessing is adjusted to the type of document that is processed. Pictures of documents need a different kind of preprocessing than scanned documents. Tasks for the preprocessing are for example rotation and deskewing of the image, as well as improving the contrast and removing noise.

0. Paper

Documents are scanned and no longer have to be manually sorted by human beings and typed in their systems.

Do you have to process documents and extract certain information?

Menu