Processing of DataGeir Øvensen
Cleaning a data set implies identification of individual entries or combinations of entries which might seem dubious, and to make decisions about what to do with them. In many projects based on information from household interviews this process has been initiated only after the completion of field work. Using SPSS-DE in the local field offices in Gaza and Ramallah, the FAFO living conditions survey managed to integrate the processes of collecting and cleaning data respectively. This approach had several advantages: First, data quality was improved by allowing for corrections of errors while still working in the field. Second, the integration of data cleaning into the field work procedures reduced the time span from the end of the field work until completion of the report.
SPSS-DE offered two main possibilities for identifying questionable entries. The simplest method was to check if values entered for individual variables were within the legal ranges. A more elaborate cleaning procedure was to check if combinations of variable values were consistent.
Valid (Individual) Entries
During data punching an audio-visual warning would appear if the puncher entered "illegal" values outside the specified ranges. Following a "beep", the screen message "Value out of range" would tell the puncher that a mistake had been made. (The programme did not, however, technically force the puncher to correct errors). Violations of the legal ranges were checked both automatically when data was entered or changed, and on specific instructions from the office staff (see reference to use of "cleaning passes" below).
Some rules were logical in the strict sense, i.e. always to be observed (like "a son must be younger than his father"). Other rules were of a kind that would hold true in 95% of the cases, based on evaluation of behavioural patterns in Palestinian society (e.g. a husband who encourages his wife to appear in public without a head scarf is also likely to accept that women are allowed to vote).
In contrast to checking legal ranges of variable values, cleaning rules could not be controlled continuously. (Rules involving two or more variables in different parts of the questionnaire could not be checked until values for all involved variables had been entered). Instead, cleaning specifications were checked by the office staff through so-called "cleaning passes". By using a cleaning pass, all entries in a file would be checked against all ranges and rules concerning the variables in that file. The results could be reproduced by the field office staff in several ways, by exposing either the ranges and rules that had been violated in each case or the cases that had violated ranges and rules. By using the possibility of consistency checks between entries offered by the data entry programme, computerized data quality checks, equalling hours of manual control, could be performed in a few seconds.
Correction of Wrong or Questionable Entries