Downloading of Bookmarks
Prerequisites: Creating a Case Project and Performing an Internet Search
See also: Setting HTTP Proxy Information
Bookmark files contain URLs pointing to some resources on the web. Theses resources are mostly web pages or file resources (such as .pdf files). To analyse theses resources locally, they need to be downloaded to the Case Project in the workspace. Once all the search result links (bookmarks) have been downloaded under the Bookmarks folder, the next step is to download the result pages as text files:
-
Right click on the desired bookmark folder in the Workspace Navigator view and click on Download Bookmarks. The Progress View shows the progress of the download. The downloaded web pages or files (for example PDF files) are stored in the predefined Documents folder of the case project.
-
After downloading the search results, a new folder will appear under the Documents folder with the same name as the selected Bookmarks folder. It will contain a text file with the raw content for every link stored in the Bookmarks folder. After the download has finished, the system automatically extracts the raw text from the web pages or files. The system can extract raw text from a variety of file formats (such as PDF, MS Office formats and others). Also, the system detects the language of the text.
After the download has finished the system automatically starts to extract the text from the resources.
Reviewing result pages and files
Double-click one of the downloaded files (HTML files have a small “HTML” icon in front of them) to open the file in the Document editor view. The Document editor view shows the extracted text. Since the entity extraction has not yet run, just the plain extracted text without any mark-up for entities is shown.