Entity Extraction

The goal of the Entity Extraction is to find locations in the text which contain entity information. In other words it tries
to find occurrences of person names, locations, VAT numbers, etc..

The overall process is split into sub modules which run in a pipeline like fashion.
The system performs the following extraction steps:

Matching Module

Matched entities

Name Variant Matching

Matches name variants from the name variant database

Geo Matching

Matches geo locations (countries, regions, cities)

Regular Expression Matching

  • Matches buit-in types such as vat number, email address, url, ip address, credit card numbers, date, phone number, zip code, personal id

  • Custom user-defined types based on regular expressions

Entity Guessing

Guesses further entities according to built-in rules

Entity Normalisation

Combines similar name variants to a single entity profile, provides unique ids to entities accross the document set