I had an engaging conversation yesterday with my colleague Lokman Tsui from Harvard’s Berkman Center. Lokman was interested in learning more about Ushahidi. The conversation touched on the topic of data validation, an ongoing challenge in the field of conflict early warning/response (and indeed many fields that comprise information collection).
I have heard many criticisms about the perceived lack of rigorous data validation in crowdsourcing excercises like Ushahidi‘s. I am starting to find the repetitive nature of these criticism somewhat amusing. (Note, Lokman himself was not criticizing, but rather asking perceptive and informed questions, which is far more conducive to a fruitful conversation).
What I find amusing about the repeated nature of criticisms vis-a-vis Ushahidi and data validation is how elementary they tend to be. The critics seem to assume that the good folks at Ushahidi haven’t given any thought about the challenges of data quality control. I’d really like to know what the basis for such assumptions is.
When I first got in touch with team in early January 2008, it was perfectly clear that they were already thinking about data validation. They were equally serious about learning as much as they could from the field of conflict early warning/response in order to improve their future efforts in this regard.
This hasn’t stopped the critics from repeating their disapproving statements about Ushahidi‘s approach. The purpose of this post is to try and move the conversation on data validation forward with the hope that the discourse on this topic will cease to sound like a broken record.
A former professor of mine at Columbia University encouraged us to address problems by formulating the following questions: (1) What is the question? (2) Compared to what? (3) According to who?
What is the question?
How does Ushahidi carry out data validation? When Ushahidi receives a new alert, the team can validate the information with any available reports from the news media. The team can also contact the individual who reported the alert to ask for further details on the event.
Lets face it, if someone sends in an alert and they subsequently get a call from Ushahidi, determining whether that person is fabricating information isn’t impossible. A few questions asking for specific details will make it apparent whether or not the person is lying.
Compared to what?
How do other initiatives carry out data validation? Take, for example, the Conflict Early Warning and Response Network (CEWARN), a regional inter-governmental initiative in the Horn of Africa has a hierarchical two-step validation protocol.
Incident reports (alerts) are submitted to Country Coordinators (CC) by Field Monitors (FM). If CC’s find a report questionable, they will ask FMs for more information to validate the report. CC’s then submit the report to CEWARN Head Quarters. If analysts at HQ have concerns about the validity of a report, they communicate them to the CC who will in turn request further information from the FM.
Here’s the catch, though: Field Monitors are based in rather remote and rural locations while Country Coordinators are based in capital cities. Both are only employed part-time. HQ analysts are based in Addis Ababa. The system does not make use of SMS nor GPS coordinates. Having worked on the implementation of CEWARN, I can confirm that the reporting and data validation process took between 2-4 weeks.
Surely Ushahidi’s approach is more efficient given the direct link between the person who submitted the alert and the Ushahidi team.
Another example of data validation is that of the mainstream media. Professional journalists are required to have credible sources and to crosscheck their sources before publishing a news article. Is Ushahidi‘s approach that different?
According to whom?
Who is asking the question? Or rather, who is doing the criticizing? Those questioning the data quality of Ushahidi alerts are more often than not academics and/or Westerners. These individuals expect a level of data quality that matches what they have in the US and Europe, where data validation processes are more institutionalized (given that they’ve been around for longer).
These critics assume that they are the intended users of the Ushahidi platform. Isn’t that a little egocentric? The purpose of crowdsourcing crisis information a la Ushahidi is to increase the situational awareness of those who find themselves facing escalating social tensions and violent conflict so they can make better decisions about how to get out of harm’s way. (Please see my previous post on Ushahidi-DRC for context).
Lets place ourselves in their shoes. If an armed rebel group is moving towards our small rural village in Eastern DRC, wouldn’t we want to know even if the information was unconfirmed? Wouldn’t we want to know so we could at least take some precautionary measures or at least think about how to determine whether the alert was credible? I think we would.
Incidentally, Ushahidi should tag their alerts as either “credible” or “unconfirmed” so that endusers subscribed to the alerts can at least get a sense of how reliable the alerts are.