Text Extraction

Text Extraction is the process of extracting raw text from multiple input file formats.The Text Extraction module of EMM OSINT Suite is based on the open source project Apache Tika.

Currently, the module supports the following input file formats:

In addition to extract the text, the language of the text is identified and stored as meta data.