Project Title: SSIX Social Sentiment analysis financial IndeXes
Project Description: Social Sentiment Indices powered by X-Scores (SSIX) aims to provide European SMEs with a collection of easy to interpret tools to analyse and understand social media users attitudes for any given subject; these sentiment characteristics can be exploited to help SMEs to operate more efficiently resulting in increased revenues. Social media data represents a collective barometer of thoughts and ideas touching every facet of society. SSIX will search and index conversations taking place on social network services, such as Twitter, StockTwits, and Facebook including the most reliable and authoritative Newswires, online newspapers, trade publications and blogs. SSIX will classify and score content using a framework of qualitative and quantitative parameters called X-Scores, regardless of language, locale or data architecture.
The SSIX project took a comprehensive and diligent approach to ethics throughout the course of the project. The team sought the involvement of an ethicist at the beginning of the project and continued that collaboration throughout. The ethical issues within the project are laid out and dealt with comprehensively here.
In this case study, we wish to focus on a solution that the SSIX project team devised to tackle one of the commonest problems facing social media data researchers.
Social media data such as Twitter and Facebook data is widely used by researchers. The data is publicly available and this is made clear to users who agree to Twitter’s terms of service and Facebook’s terms of service. However, the terms of service documents are long, detailed and dense. The likelihood that users actually read the terms of service before agreeing to them is slim. Added to that, the terms of service are updated and altered on a continual basis. These facts pose a dilemma for many Twitter and/or Facebook data researchers.
Key question: It is legal to use people’s Twitter and/or Facebook data for the purposes of research, but is it ethical?
On examining this question, the SSIX team concluded the following: It is legal if we are to follow the terms and conditions of such companies, although it is not necessarily ethical in that the public data may contain personal data such a username or other identifier. The forthcoming General Data Protection Regulation provides clearer guidelines under Article 9 Paragraph 2e “processing relates to personal data which are manifestly made public by the data subject” so one could interpret social media as public data containing some personal information which makes it legal for processing.
More importanly the GDPR gives clearer guidelines to researchers here:
Art. 89 GDPR Safeguards and derogations relating to processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes
Processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, shall be subject to appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject. Those safeguards shall ensure that technical and organisational measures are in place in particular in order to ensure respect for the principle of data minimisation. Those measures may include pseudonymisation provided that those purposes can be fulfilled in that manner.
Safeguards in SSIX
The SSIX ethics board considered that the following personally identifiable information will not be used: name, address, age, gender, photos, date of birth SSIX will remove these details from collected data, but will need to keep account IDs for spam filtering. With regards to Twitter data (being the vast majority of SSIXs source data) the account name (real or pseudonym) will need to be removed, in some cases, birthday (not a date of birth) may be present and will be removed, no pictures will be collected. Twitter does allow the user to add their location but not an address. There is no profile setting for gender.
Key question: Can ticking a terms of service box really be classified as informed consent?
One could argue no if it is ambiguous. See https://gdpr-info.eu/recitals/no-32/
Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. Consent should cover all processing activities carried out for the same purpose or purposes. When the processing has multiple purposes, consent should be given for all of them. If the data subject’s consent is to be given following a request by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided.
Key question: If researchers want to use Twitter and/or Facebook data for research purposes and they are concerned about the consent issue, how do they address this problem?
For researchers, to seek further consent from Twitter and Facebook users seems an impossible task. The volume of users and data renders this prohibitive.
SSIX researchers have devised a plausible solution to this conundrum and used it throughout the project.
A consent manager
A more manageable way to give Twitter and Facebook users agency over the use of their data in SSIX research was via a consent manager, via which any Twitter or Facebook user can opt out of having their data collected and used in the research. The existence of the consent manager was widely publicised in the project literature and disseminated as widely as possible. Contacting every Twitter user by email was not deemed advisable by the ethics expert as this would be deemed spamming.
If a Twitter or Facebook user contacted the consent manager and opted out of the process, their data would be automatically discarded from all future datasets and excluded from the research. The consent manager is available to see here.