Introduction

Creation: 15/10/2018 Keywords
Last update: 26/10/2018 Corporate No
Registration Not yet registered Language English
Status Comments Review Model no Model
Deleted No EDPS opinion (prior check) No
DG.Unit JRC.I.3 Target Population Citizens
Controller KING Matthew DPC Notes -
Delegate MACMILLAN Charles
DPC NUNEZ BAREZ Laura, SPICAR Premysl

Processing

Name of the processing

Monitoring of news and social media

Description

The Joint Research Centre (JRC) Text and Data Mining Unit performs automatic monitoring of online and social media in order to provide media monitoring services to staff across the EU institutions and to the general public. A moderated list of online media sources is monitored (e.g. BBC News, Le Monde, El Mundo) and the text of new articles is downloaded and processed. Social media sources including Twitter, Facebook, YouTube, Reddit and blogs are also monitored by downloading and processing new posts. The processing includes categorisation, identification of places, organisations and persons mentioned in the text, indexing and presentation or the results through websites, mobile applications and newsletters. EMM processing chain:

The EMM processing chain consists of several modules. The main modules are as follows:

  1. Scraper: The Scraper module visits a pre-defined list of websites searching for new content. The list comprises online news sites, blogs and forum sites (e.g. reddit). The focus lies on processing publicly available material. Wires from news agencies or content from other sources can also be monitored.
  2. Grabber: The Grabber module retrieves new content to build a feed of news items.
  3. Entity Recognition: The entity recognition module detects probable mentions of persons and organisations within the text storing them in a cache. If an individual or organisation is detected in multiple articles from multiple different sources, that individual or organisation is automatically added to a database of entities. A human moderation step is included allowing for amendments or deletion of entities. Further information from Wikipedia can be used to enrich the entities e.g. with pictures of persons of public interest.
  4. Entity Matcher: The entity matcher module identifies known entities (persons, organisations) in the news items using the entity database. These entities have previously been recognized by the entity recognition module mentioned above.
  5. Geolocation: The geolocation module matches known geolocations in the text and aims at identifying the most relevant geolocation for each news item.
  6. Categorization: The categorization module relies on lists of user-defined keywords including Boolean expressions.
  7. Filtering: The filter module allows for filtering for language, country (country in which the article was published or country mentioned in the text), sources or combinations of categories.
  8. Deduplication: The deduplication module identifies news items that are mostly identical to previously retrieved news items. In addition, Twitter data are retrieved either via a sample (e.g. 1 % or 10 % sample of all published Tweets) or via queries to select individual Twitter users (e.g. @WHO), hashtags (e.g. #ebola) or keywords (e.g. chikungunya). Only publicly available tweets are processed. On a given topic (e.g. Xylella fastidiosa), information from news and social media can be jointly displayed to allow for further analysis, e.g. word clouds can be produced, thereby linking news and social media.

List of JRC systems/services: There are several websites that use the results generated by the above-mentioned EMM processing chain, comprising:

  1. EMM (Europe Media Monitor) – Fully automatic system for analyzing both traditional and social media
  2. EMM Newsbrief – Public frontend which groups related items, categorises them into thousands of categories, extracts information, produces statistics, detects breaking news and sends out alerts.
  3. EMM NewsExplorer – Public frontend which displays the main news items per calendar day separately for 21 languages showing which persons, organisations and countries were mentioned most in the news.
  4. EMM Newsdesk – Moderation tool for creating and publishing newsletters as well as sending alerts via email or SMS using data from EMM.
  5. EMM MIA (Media Impact Analysis) – Tool for media impact analysis using data from EMM.
  6. EMM TIA (Trend Impact Analysis) – Tool for analysing media trends using data from EMM.
  7. EMM MyNews – Customisable frontend using data from EMM.
  8. EMM Big Screen Map – A map view of EMM data highlighting the latest news on a world map.
  9. MEDISYS (Medical Intelligence System) – Media monitoring system for event-based surveillance to rapidly identify potential public health threats from media reports.
  10. Mobile applications for iOS and Android.

Processors

-

Automated / Manual operations

The processing operations are performed automatically, except for:

  • Selection of articles and their publication in newsletters distributed through email or on websites;
  • Moderation of persons identified by title, first and last name detected automatically by the system in online news;
  • Creation of keyword categories for classifying news and social media items, and linking news items from different publishers together and to social media items on the same topic.

Storage

The data gathered and generated by the system is stored electronically in files and in a database allowing searches of the full text, on servers at the Joint Research Centre.

Comments

-

Purpose & legal basis

Purposes

The purpose of the processing is to:

  • provide multilingual media and social media monitoring services to EU Institutions to support spokespersons, communications officials, policy officers and decision makers with near real time information on current affairs, emerging topics of current interest and the volume and tonality of articles and posts in predefined categories corresponding to policy areas, high level EU officials etc.;
  • allow searching for historical news articles and support the analysis of trends in reporting over many years; and
  • perform research in text mining, natural language processing and computational linguistics including the identification of places, persons and organisations in text, detection of sentiment and tonality, approaches for clustering and deduplicating related texts and approaches to detect hate speech and related phenomena.

Legal basis and Lawfulness

The processing is lawful under Article 5(a) of Regulation 45/2001, which states that personal data may be processed only if "processing is necessary for the performance of a task carried out in the public interest on the basis of the Treaties establishing the European Communities or other legal instruments adopted on the basis thereof or in the legitimate exercise of official authority vested in the Community institution or body or in a third party to whom the data are disclosed", and, for registered users, under Article 5(d) of Regulation 45/2001, which states that personal data may be processed only if "the data subject has unambiguously given his or her consent". Art. 1 of Commission Decision 96/282/Euratom entrusts the JRC with a role to "carry out the Community's research programmes and other tasks entrusted to it by the Commission". Regulation (EU) No 1291/2013 of the European Parliament and of the Council of 11 December 2013 establishing Horizon 2020 - the Framework Programme for Research and Innovation (2014-2020) states that "the Joint Research Centre (JRC) shall contribute to the general objective and priorities of Horizon 2020 with the specific objective of providing customer-driven scientific and technical support to Union policies". Council Decision of 3 December 2013 establishing the specific programme implementing Horizon 2020 - the Framework Programme for Research and Innovation (2014-2020); Successive Commission Implementing Decisions on the adoption of multi-annual work programmes under Council Decision 2013/743/EU and Council Regulation (Euratom) No 1314/2013, to be carried out by means of direct actions by the Joint Research Centre, including C(2017) 1288 for the period 2017-2018 and C(2018) 1386 for the period 2018-2019. COM(2018)236, Tackling online disinformation: a European Approach, Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. The "communication presents a comprehensive approach that aims at responding to those serious threats by promoting digital ecosystems based on transparency and privileging high-quality information, empowering citizens against disinformation, and protecting our democracies and policy-making processes." The communication also points out that "the Commission will continue its work in this area". Successive specific Administrative Arrangements concluded with DG Communications of the European Commission, with the European Parliament and with the Council for media monitoring services and research

Data subjects and Data Fields

Data subjects

All persons mentioned in online news media: politicians, journalists, persons of public interest, persons considered newsworthy, etc.

All internet users whose social media posts are analysed.

Data fields / Category

Any personal information contained in online news reporting from the sources monitored will be collected, indexed and made available for searching by the system, in a similar way to internet search engines. While this could in principle include information covered by Article 10.1, no specific processing is applied to this information if present.

Data fields:

  • Title, first name and family name of recognised public figures, data retrieved from Wikipedia
  • Quotes from a person and quotes about a person, as extracted from news items
  • Entities that are co-mentioned in news items
  • Organisations which can be identified in the text are specifically processed
  • Social media user identifiers, e.g. in MEDISYS top twitter users, MyNews Twitter analysis

Rights of Data Subject

Mandatory Information

See attached privacy statement. List of attachments

Procedure to grant rights

Data subjects can directly access any information held by the controller, through the search functionality of the websites. A link will be provided allowing them to request deletion of their information. Before deletion, the controller will verify that the information refers to the data subject making the request, as references to individuals in news reporting are often ambiguous. Rectification of the information will not be provided.

Retention

News article text – Data retention time is unlimited for research purposes to study long-term trends of topics of relevance to the European Union, unless the data subjects mentioned in the text request deletion of the record. Entity entries – Data retention time is unlimited for research purposes to study long-term trends of topics of relevance to the European Union, unless the data subjects requests deletion of the record. Social media post ids – Data retention time is unlimited for research purposes to study long-term trends of topics of relevance to the European Union, unless the data subjects mentioned in the text request deletion of the record.

Time limit

The controller will reply to all queries from data subjects within 15 working days.

Historical purposes

-

Recipients

Recipients

Data are available to the general public through internet websites.

Transfer out of UE/EEA

Not applicable.

Security measures

Technical and organizational measures

Processing and data storage are performed on servers located in secure data centres to which physical access requires specific authorisation. Access to the data is available only to authorised users, checked through login and password.

Complementary information

The media monitoring system described above is widely used across the EU Institutions and at a recent workshop held for media analysts, "The European Commission, European Parliament and European Council participants confirmed that the JRC Europe Media Monitor (EMM) system is crucial for them to ensure the media monitoring activities required by their leaders. Research and development by the JRC is needed for them to meet new challenges, such as monitoring of social media and fake news." Main users of the media monitoring systems are:

  • General public
  • European Commission: DG COMM, DG HOME, DG GROW, DG SANTE, DG RESEARCH
  • European External Action Service
  • European Council
  • European Parliamen
  • EU agencies: ECDC, EFSA, EMCDDA, FRONTEX
  • United Nations (UN), World Health Organisation (WHO)
  • EU member states and G7 countries