The Research Value of Data

It is important to note that data can take a number of forms. The terms "raw" or "primary" data refer to information available in the form in which it was originally recorded: for example, a registration record, a daily count of new hospital cases, or a list of secondary schools. In contrast, statistics are numbers produced to describe or summarize patterns of primary data: number of families registered in a given Field, average hospital admissions per month, percentage of districts containing secondary schools. Statistics are often easier than primary data to access and manipulate, but because they are essentially summaries, they always imply a loss of original information. This may limit their usefulness for some types of research.

Another important dimension is the unit of analysis or type of population that the data describe. Different units of analysis are appropriate for different types of data. For example, the implied unit of analysis for age or gender data is the individual; for measures of income, it may be the household or family; and for infrastructure or public health measures, the village. Units of analysis can of course be combined to different levels of aggregation, such as the Area, the Field, or various socio-economic groupings. The unit of analysis may also be relevant to technical discussions about linking different data bases, such as individual birth records and family-level registration files.
As a general rule, primary data and statistics at low levels of aggregation offer the researcher a wider range of options for analysis. Medical data aggregated to the Field level, for example, cannot be used to compare urban versus rural patterns in the incidence of disease; however, data at the clinic or hospital level can in principle always be aggregated up to the Field. This assumes, of course, that units can be grouped reliably into higher levels: in the example above, all clinics must file disease reports consistently, using the same definitions and criteria. Concerns about confidentiality and personal privacy may also arise with disaggregated data. However, these can in principle be resolved with proper safeguards, because few surveys require names of individuals or exact addresses of households.

As we see in the sections to follow, the UNRWA data currently available vary in all these dimensions. Some data sources, such as the registration records and Special Hardship Cases data base, are maintained as primary data; others, particularly in the areas of health and education, are comprised of statistics at various levels of aggregation. Depending on the source and subject matter, underlying units of analysis may include the individual, the family, the clinic, or school.
With these basic principles in mind, we turn to a discussion of other attributes that affect the suitability of data for research. The research value of data can be said to be a product of four factors: the data's relevance, their scope or coverage, their quality, and their accessibility. A researcher will try to maximize both each factor and their product, although concessions will sometimes be necessary, for example if the quality of the data is high, the researcher may want to use relevant data even if it is of limited scope or of difficult access. If the coverage is good, the data may be used even if the quality is not perfect. In this case the researcher will try to assess possible errors introduced to the estimates. On the other hand, if there are serious problems with any one of the four factors, the research value of the data is limited.

By relevance we mean the interest of the data in terms of their potential for answering a research question.

The comprehensive geographical and thematic scope of UNRWA's data is their greatest advantage. The UNRWA data represent one of the few possibilities for comparison of the conditions for Palestinian refugees in different host countries. The fact that records were systematically collected over time creates opportunities for time series analysis, and further increases the value of the data.
Scope or coverage also relates to the populations or segments of a population the data cover. In UNRWA data, coverage has limitations in the areas of health and education, as information is only collected about those refugees who use UNRWA's services. In these areas, the question of coverage may imply severe constraints on the research potential of data. This problem is related to the ability to generalize to a larger population, and will be further discussed below.

The quality of data has two dimensions: reliability and validity. Data with little reliability have limited value for researchers. The reliability of data is determined by how data are produced, as the term refers to the accuracy of the various operations in this process. Here proper documentation of procedures for collection of information is of great importance. Data have high reliability if repeated measurements of the same phenomenon provide consistent results.

To provide an accurate representation, data must be valid as well as reliable. Validity means that the data actually measure the concept that they purport to measure. For example, "wage" or "salary" income alone is not always a valid measure of household income, as some families will supplement these earnings with income from home-based business. Data can have low reliability and high validity, or vice versa. Also, reliability and validity are not all-or-none properties; both are matters of degree. The validity and reliability of the various UNRWA data will be discussed in more detail in the following sections of this report.

By accessibility we mean the ease with which data can be obtained by a researcher and arrayed in a form suitable for research. Access to data thus has both legal and technical dimensions. Legally, access may be constrained by need to obtain permissions from the responsible authorities. To accomplish its mandate of providing services to refugees in a politically unstable landscape, UNRWA may have to place constraints on researchers, and the need for permissions may be duly justified. Introducing standardized procedures and forms for application for permissions may, however, improve access for approved researchers from outside the Agency.

Regarding the technical dimension of accessibility, central storage and computerization of UNRWA's data would substantially improve access for researchers. Computerized data bases can be accessible from anywhere through computer networks. An index of contents (such as lists of tables and variables), and documentation of data collection methods further improves access to data.


al@mashriq                       960428/960613