Editing the current Name Variant Database
The EMM OSINT Suite uses a large database of named entities containing mainly persons and organizations (see the Name Variant Matching concept for more information).
The suite allows editing its Name Variant Database in order to add new entities or modify existing ones, keeping the current keys assigned for each entity.
The Name Variant Database in OSINT is composed of four columns as follows:
-
KEY. The import process does not take the values under this column into account because it already assigns its own primary key for each new entity. However, this column must appear as the first column within the TSV file, although their values are discarded.
-
PID (Profile Identification). It is the identification value of an entity. This numerical value is very important. For variants (different names for the same entity) that belong to the same entity, this value must be identical. The first occurrence found in the TSV file will be considered as the canonical entity (original name form of the entity) and the following ones as variants of this canonical form (see the example below).
-
TYPE. It is the type of the entity. OSINT accepts four main entity types:
-
o, for organizations
-
p, for persons
-
t, for toponyms (locations)
-
u, for unknown types of entities
-
-
VARIANT. It is the form or name of the entity, exactly written as you want that the process matches it in the documents.
Next, an example of a excerpt from the Name Variant Database in OSINT is shown:
|
key |
pid |
type |
variant |
|
2 |
11 |
p |
Aaron Albert |
|
3 |
11 |
p |
A. Albert |
|
4 |
11 |
p |
A. M. Albert |
|
5 |
21 |
o |
Chad Calvin Christian |
|
6 |
21 |
o |
CCC |
|
7 |
21 |
o |
C.C. Christian |
|
8 |
41 |
t |
Milano |
|
9 |
61 |
u |
Harold Hugh |
|
10 |
61 |
u |
Henry Hugh |
In this example, it can be observed how the entity Aaron Albert (person) has a PID value of 11. The first occurrence would be the canonical (original) form for that entity, whereas the next ones found with the same PID (A. Albert, A. M. Albert) are considered as variants of that canonical form. However, all these occurrences (variants) represent the same entity in real life (the person Aaron Albert). Another example in the table is the organization called Chad Calvin Christian (PID 21). As can be seen , there exist one canonical form and two variants (CCC, C.C. Christian) for this entity. Finally, we find the entity Milano (PID 41) with only one variant (the canonical form) and one entity of unknown type (Harold Hugh) with two variants.
It is important to note that it should be used a UTF-8 flat file with TSV (Tabular Separate Values) format.
The procedure of editing the current Name Variant Database should be done in three steps:
-
Export the current Name Variant Database to a flat file
-
Open the database file and add new entities or modify existing ones. It is important to note that the database file must be saved in UTF-8 format and always keeping the structure of four columns, separating them by the tabular character.
-
Import the updated database file into OSINT
Exporting the current name variant database to a flat file
-
Open EMM OSINT Suite and click in the main menu on File > Export > Entity Extraction > Export Name Variant Database
-
Click on Next and then Browse in order to select the file in your computer in which exports the current Name Variant Database. You can use any name for this file.
-
Finally, click on Finish and a progress bar on the right-bottom of the window will be shown. It might take few seconds depending on the size of the current database. The process of exporting can also be followed in the Progress view
The exporting process will finish when this progress bar disappears.
Opening the database file and adding new entities or modifying existing ones
The new export file generated is a flat file composed of four columns separated by the tabular character (see above).
Any text editor can be used to modify the database file (TextPad for Windows, WordPad, ...).
Once we have added or modified the new entities, as explained above, the database file must be saved in UTF-8 format and always must keep the four columns format.
Importing the updated database file into OSINT
See Importing a new Name Variant Database to import the updated database file.