I have worked with several NGOs, as well as in hospital, clinics and academia. In many of these places people are collecting AMAZING data from their programmes and projects. Data that I would love to get my hands on to analyse. Unfortunately, in most of these places, they are also collecting these amazing data either on paper or on a spreadsheet (e.g. Excel).
This is frustrating for me for a number of reasons. First, in many cases there is a lot of wasted time and effort, as much of this data is never analyzed. Second, people often don’t realize how precious this data is, and the risks entailed by collecting them on a spreadsheet or on paper. Third, I am one of those people on the other side of the equation, receiving that data for analysis and usually having to deal with the data collection shortfalls.
Let’s be clear here. I’m not waging a war against Excel or any other spreadsheet for that matter. They are highly useful. I use them for a number of things, but they are just not meant to be used for data entry.
So let’s talk about why collecting your data on paper or on a spreadsheet is a bad idea and what better tools you could be using instead.
Data captured and not analysed is a waste of time and effort. Data captured on paper, if it is to be analyzed at all, and thus become useful, will need to be computed manually or be typed into a computer. Both options are inefficient, time-consuming and repetitive, so you want to minimize the time your team spends doing this. I’ve talked with many people in charge of data capturing and they are rarely if ever captivated by they work (shocking!), they just see it as tedious and meaningless.
If you are using paper questionnaires or a spreadsheet with open text cells (that is, cells that allow any type of answer), people collecting the data can fill out their responses any way they want. And trust me, they will do it differently every day, and differently than their colleague. Some will write the date of birth as mm/dd/yyyy while others will use the format dd/mm/yyyy. Distinguishing between the two afterwards can become tricky.
In other cells or paper forms, if you don’t restrict the possible answers (which you CAN do in Excel), some will write “New York”, others “NY”, others “new york” and yet others “ny”. In a clinic, you may have three different names for the same diagnosis. Some will leave key questions blank.
Whenever variability is allowed, it will flourish. The problem with this is that some data may become unusable (if you can’t distinguish the correct date for instance), or will take a longer time to clean up. Result: precious time and effort wasted (and maybe some grumbling from your data analyst).
When data is captured by hand, someone has to pass it to a computer. This is a very common source of errors. When I analyse databases it can sometimes be easy to spot these types of errors: Entries duplicated or mixed up, impossible dates of birth (e.g first of january 2035), etc. A good aim in data collection is always to try to minimize your sources of error.
To understand the difference, let’s first talk about how a data-entry system works.
In programmes designed for data-entry, you capture the data in a “questionnaire” screen, where you only see the questions for the particular person or household for which you are entering data. The questions have a restricted set of answers to choose from (e.g. “New York” but not “NY” for instance). The data you capture in this way is saved in a “table”, that will look similar to Excel, but which not everyone can access.
In contrast, in Excel or similar spreadsheet solutions, you get to see the whole table of data, and you input the data directly into the cell you want. In many cases, you can write the answer in any way you like inside the cell. Thus, using a database instead of Excel results in “cleaner” data (e.g. just one name and not three for the same diagnosis), and it protects your data from mistakes, as access to the table data is restricted.
One of the worst things that I regularly see happening with spreadsheets, unfortunately cannot be fixed. Excel allows you to move/reorder columns or rows independently, which means that you could accidentally reorder the “date of birth” column, while forgetting to do so for the “name” column, resulting in names and dates of birth that don’t match.The problem is that many of these errors cannot be undone, as there is no way to trace them, so the integrity of the data can come into question when errors are evident at the time of analysis.
In most databases created by a data-entry system, the information in a single row, which normally pertains to the same individual (e.g. name, age, sex, etc), is linked together, so that this type of error is not possible.
By now, I hope you are wondering what you can use instead.
There are various user-friendly options out there for data-entry software these days. And many are free. Data collected in them can later be analyzed with your favorite statistical package (or even with Excel if that's your thing!)
If your organization has Office packages, you might want to try Microsoft Access.Two other options that I particularly like, and are widely used in health projects are EpiInfo and EpiData, which are freely available online and not very hard to set up.
For non-routine, small-scale data collection exercises you may want to try out Google forms or Surveymonkey. I’d be hard-pressed to find a more user-friendly option than these two. They are also free for small-scale questionnaires and you can send them out by email. A disadvantage though is that they are online-based, so you need an internet connection.
So what solution are YOU using? And are you happy with it? Let me know in the comments.
Contact me: email@example.com