Six key steps for using personal and community data safely

Author: Sam Wozniak, Technical Data Researcher
Development Initiatives

18 November 2020

What is community data?

Community-driven data, or ‘community data’, must be collected by a civil society group, be on a group, and data must be in the interest of the community it is being collected about (this may be the data collecting entity, or just the community data is being collected on). This could be, for example, data which groups people by religion, caste, gender, age, politics, profession or disability. There are many different kinds of civil society organisations who collect this type of data – from charities, trade unions and social movements, to grassroots organisations, online networks and communities, and faith groups.

What is personal data?

Information is classified as ‘personal data’ if an individual’s information is distinguishable. On the one hand, individuals can be identified by their name, date of birth, ID documents, address, health records, qualifications, employment details and ‘online identifiers’, such as their IP addresses. On the other hand, if data can be combined with other data to distinguish a person it is also categorised as personal data.

What are the privacy risks?

The main aim of community data will often be to shine a light on an identifiable community, but there may be some instances when members of an identified community face a collective risk that the data could be used in ways that they do not want, or, worse still, in ways that could cause them harm. Community data may also contain personal data, which has the potential to be mis-used in similar ways.

In the context of international development, vulnerabilities are increased because data often includes sensitive information relating to communities, or individual person’s (and their kin’s) gender, location, housing, assets, income, occupation, education, health, receipt of government services, etc.

How to protect personal and community data

Here are some basic guidelines for organisations to consider at every stage of the data’s life cycle to ensure that risks are minimised.

Step 1: Preparation

Ensure data privacy laws in the jurisdiction in which data will be collected are fully understood.
Consider the potential harm that could be done by focusing on particular communities or aspects of them.
Be aware of other data that is available where samples crossover, i.e. other data on the same village or the same group of individuals, and check if any individuals could be identified by referencing any other datasets.
Consider the sensitivity of the data that will be collected while preparing questionnaires. Only attempt to collect essential information and refrain from asking questions on a topic if it is deemed that the risk posed is too great.
Don’t include unnecessary detail in the design of the questionnaire. Don’t ask for a person’s date of birth if knowing their age is sufficient. Don’t ask for a person’s exact address if knowing their village is sufficient.

Step 2: Collection

Ensure the team doing the data collection understands the importance of privacy matters before collecting any personal data. Inform them of any relevant data protection protocol that your organisation has.
Obtain participants’ informed consent to gather their personal information. Inform them about the group and/or organisation doing the data collection, the purpose of the interview, the type of questions to be asked, how participant’s data will be used, stored, shared and destroyed, and how they can withdraw consent.
Stick to the script. Do not extend your questioning beyond the agreed questionnaire.

Step 3: Storage

Practice pseudonymisation by keeping directly identifiable information (e.g. names, phone numbers, etc.) separate from the rest of the data. To do this create a key containing a list of respondent names with randomly assigned household identification numbers and keep it separately and securely.
Wherever possible, avoid keeping phone numbers, emails or detailed locations.
Maintain strict control over who has access to identifiable data.

Step 4: Processing

Practice anonymisation by removing enough detail from a dataset so that, by all means reasonably likely, individuals (or, in some cases, communities) can no longer be identified. To do this, categorise data into groups which are less broken down (e.g. village not street, age range not age, etc.).
Remove specific data if it is distinguishable – such as households with a large number of children.
Maintain and monitor a record of all copies of datasets and transfers of data.

Step 5: Publication

Ensure it is not possible to identify individuals.
When presenting the data consider, carefully what disaggregations (e.g. by age, sex, gender, location, religion, community, etc.) are absolutely necessary for your purposes.
Only publish data that has been consented to.

Step 6: Retention and destruction

Ensure that all copies and versions of any personal data are destroyed within the time limit that participants consented to.

Making Voices Heard and Count

Six key steps for using personal and community data safely

What is community data?

What is personal data?

What are the privacy risks?

How to protect personal and community data

Further Reading