Radical
Statistics

The Journal

The Subjects

The Books

News

Links

About

Home

Statistical exclusion and social exclusion: the impact of missing data

Ludi Simpson

Introduction

This article was prepared for a conference of the International Statistical Institute in Mexico in 1998. The ISI is the international professional club of statisticians, and the conference was organised by its Official Statistics and Survey Statisticians sections with the title 'Statistics for Social and Economic Development'. The title was the only encouragement to speakers to address any particular issues, problems or audiences.

I decided to broaden the understanding we have of non-response, usually restricted to censuses and surveys in the richest countries, to other types of missing data and to other countries.

What I have to say comes partly from my own work in Britain including a review of census literature on non-response (Simpson and Middleton, 1997), partly from the ongoing work of the Radical Statistics Group in Britain (eg. Irvine et al., 1979; Dorling and Simpson, 1998), and partly from the 90 responses to a letter sent in May 1998 to organisations and individuals in the handbooks of the International Statistical Institute (ISI) and its section the International Association of Official Statistics (IAOS). I asked for information about the social characteristics of missing data and for details of any social enquiry that had been recently postponed or cancelled. I thank all of the respondents. The paper discusses missing data in general, not limited to any one country.

The paper reviews:

  • the characteristics of missing data,
  • the consequences of these data being missing, and
  • the solutions or remedies that are available,

under each of three categories of missing data that provide a wider view of missing data than is usual:

  • The many questions that were never asked
  • the many analyses that were never made available, and
  • the enquiries that are victims of people who do not reply to them.

The last category is what we normally think of missing data, on which some findings have been published, and about which my respondents had most to say.

The paper ends with comments on the quality of administrative registers, the responsibilities of research and statistical staff, and an example from Nicaragua which illustrates much of the preceding general framework. There isn't space for the evidence for all the general results which I assert, there are references instead.

Missing questions

The economics of every enquiry limit the number of questions that can be asked. There are also whole areas of questioning, whole enquiries that are not carried out. For an extreme but relevant example, which government asks its poorest citizens how it could improve their conditions?

Mitchell's standard set of historical international statistical comparisons states at the outset that "It is glaringly obvious that the biggest single problem is lack of availability of the data we should like to have, even, in some cases, for quite recent periods" (1995, p.viii).

In the period 1975-1984, 79 of the 94 Less Developed Countries with a population of over 1 million took a census, compared with 49 out of 94 in the period 1945-54. This number dropped back to 71 in the most recent period 1985-94. The quality of basic census results and the speed of reporting them have improved, but collection of more detailed data needed for national demographic policy has not developed over the period.

Government and commercial agencies naturally address their own policies and their own interests, in their own language. For example, the addition of ethnicity in the census schedule of the UK and elsewhere has followed the enactment of race legislation, rather than preceded it, though social surveys of a more independent and academic nature have asked such questions for longer.

Official statistics organisations also tend to avoid qualitative research, which is usually necessary to understand the social and economic relationships between people and between organisations.

Another characteristic of social statistics is their general limitation to people and their current activities. One can easily find data to support a discussion of urbanisation, but not of the geography of capital; of immigration, but not of emigration; of the distribution of income, but not the distribution of accumulated wealth.

The consequences are mainly that new issues are missed. It is also true that issues of power are kept off the agenda of government enquiries. At best the government might be compared to a parent who does not see how its child's future is developing independently of its parents. At worst, the selectivity of official enquiries is an instrument of suppression of debates that should happen but are inconvenient to government.

As one solution, international bodies do encourage a minimum of social and economic statistics. The UN's System of National Accounts (SNA93) has for example encouraged measurement of the informal economic sector as described elsewhere in this conference. During the period 1971-1993, the UN Population Fund allocated US$300m to the collection of basic data (Cleland, 1996).

Some governments and their staff support independent research by non-government organisations, trades unions, and social movements, acknowledging they service honourable goals. The extent to which the UK government has acknowledged the role of its statistics outside government is discussed by Muriel Nissell (1995).

However, government support for independent research requires sensitive and co-operative approaches. In situations of conflict it is an unattainable goal. A truly independent statistical practice is indispensable to democratic representation of different interests.

Missing analyses

Once an enquiry has been carried out, with its selected questions, there is further selection in which tabulations, reports, and commentary on the results are produced. Restrictions on the release of data create a mountain of missing data, which could help to better understand our human condition in order to change it to our advantage. All too often reports do not exploit more than a small part of those data.

Although international agencies have encouraged the growth of statistical indicators world-wide, their standards may not meet a country's unique needs. The ILO's recommendations on measuring unemployment have been extended to meet the local conditions of Mexico, recognising work that is for few hours, and pressure from employees who seek extra work (Jarque, 1998). The economic ideology of international agencies has been converging, which has in turn prioritised resources for some analyses over others (Townsend, 1998).

Most government statisticians can think of examples for their own country of data which were not released because they would be embarrassing to government.

Categories used in output can be restrictive. For example, income categories may be designed for taxation purposes rather than commercial or social investigation of poverty; the occupations of women are often classified according to categories based on male occupations.

The selectivity of analyses has much the same consequences as the selectivity of questions: a focus on a limited number of priorities at the expense of a broader view acknowledging new issues and alternative views.

Occasionally corruption in the sale of data occurs on a grand scale (Komsomolskaya Pravda, 1998). The policy advisers that suggest sale of data on the market as a step to democracy, should not do so without considerable caution.

As for solutions, the cost of dissemination to public libraries and other forms of public access should be included in the budgets for data collection. Data can then be freely returned to the public they were collected from. With better dissemination of documented datasets a rich diversity of analyses can be encouraged. Reports should always draw on previous work with different approaches to the same subject matter.

Missing persons

Data are also missing because part of the target population for an enquiry did not respond. I will focus mainly on people in household enquiries, including surveys and censuses.

Government surveys that are not compulsory usually expect to incur around 10%-40% non-response in developed countries (De Heer, 1996). In longitudinal studies the burden on respondents of repeated sweeps, and the difficulty of tracing respondents who have moved, often leads to 50% attrition of the original sample after only three sweeps.

It is hard to evaluate the coverage of a census, because an evaluation requires an even better count than the census. Those that are missed by a census are very hard to count indeed. The coverage of 77 censuses conducted in the 1960 to 1980 rounds have been estimated to have median levels of under-enumeration of about three per cent in Asia, four per cent in Latin America, and five per cent in Africa, but these figures are subject to severe limitations (National Research Council, 1981, quoted in Cleland, 1996). Countries with a developed infrastructure do not expect to reduce their census undercount below the current 2-3%.

Gross under-enumeration is higher than the levels quoted here are net under-enumeration, equal to the overall under-enumeration minus the overall over-enumeration. In some circumstances over-enumeration can be considerable, through double-counting, counting ineligible people, and fraudulent returns.

There are three general characteristics of non-respondents for which there is a great deal of evidence from many different countries:

  • The largest group of non-respondents are socially excluded, through economic poverty or through special legal or cultural status.
  • Those who have a negative view of government, for whatever reason, are less likely to be included in government enquiries.
  • Unsettled people are hard to include in enquiries. If they are in the process of migrating, they are hard to find and their 'current circumstances' difficult to define.

Thus response to surveys and censuses is lowest among the unemployed, those in poor housing (shared, temporary, or rented), among young men, in dense urban areas and among discriminated racial and cultural groups (eg. Simpson and Middleton, 1997, for UK, Canada, Australia, and the USA; Vehova and Zaletel, 1996, for Slovenia; Coeffic, 1993, for France).

Each country, each epoch and each survey will have its own unique 'holes' in response. For example Israel's recent census undercounted the Druze in the Golan Heights (who consider themselves Syrian, not Israeli), settler groups (who protested at government policy through census non-co-operation), and some ultra-orthodox Jewish sects (who oppose both the counting of Jews and the notion of a Jewish nation state) (Ben-Moshe, 1998). The catastrophic changes in Europe's formerly socialist states have led to diminished response rates (Havasi and Marton, 1997). The Bangladeshi census has a higher non-response among women than men (Kantner, 1994), perhaps reflecting women's low cultural status with regard to official enquiries.

The rich are a group unlikely to respond to government enquiries in some areas (of Hungary, Jamaica, Philippines, Samoa, St. Vincent, and the USA: Havasi and Marton, 1997; Murthy, 1998; Collado, 1998; Tauasosi, 1998; Allen, 1998; Bryant and Dunn, 1995), especially in settlements which are guarded by armed employees. Even when contacted, individuals in the ruling class may refuse to respond to government enquiries if their well-being is not dependent on government services and they have other ways of influencing government policy (Murthy, 1998).

Questions about income receive a poor response from those whose calculation of income is difficult, including workers within the informal sector, the self-employed (which is a large sector in many countries) some professions such as lawyers and doctors, and share-holders (eg. Lillard et al., 1998). Fear that data may be used for taxation purposes depresses both the response rate and the reported incomes of both self-employed and propertied classes (de Lucas, 1995).

Which are seen as the most important consequences of non-response depends on the viewpoint. In the UK, those commercial agencies that target high incomes are not concerned that the census misses many of the poorest people. On the other hand, the validity of an enquiry into aggregate wealth would be destroyed by missing only a few high-income earners.

The concentration of certain social groups among non-respondents suggests that many applications are biased by incomplete data, although there are very few quantitative studies to test this hypothesis. Some general results can be expected:

  • Missing birth and death registrations result in under-estimated fertility and mortality rates, if unbiased population estimates are used as denominators.
  • Net migration flows to cities and to rich countries are generally under-estimated, and consequently the growth of regions and countries is also estimated with bias (Middleton and Simpson, 1998).
  • It is the norm, not the exception, that everyone living in poor city areas receives less government resources and less electoral power than expected, simply due to missing data. In the United States of America, the struggle to recover those missed by the Census has had such political consequences that it has been fought in the courts and in Congress, so far with success by the rural states that gain from ignoring urban non-response (National Research Council, 1995, Chapter 2; Bryant and Dunn, 1995).

The selectivity of missing data can lead to a very strong conclusion about social statistics: "It is questionable whether the social survey approach can produce valid information on precisely the target groups of social policies" (van Tuinen, p.10). That is a very serious consequence which, although a reasonable assessment of the threat, may be too pessimistic.

The best remedy is to achieve complete, 100% response in the first place. However, this requires not only improved fieldwork methods, but a reduction in social exclusion, and improvement in relations between government and people. This puts limits on what statisticians and researchers can achieve alone. Independent social research institutions can undertake social enquiries where attitudes to government do not prevent or bias responses.

Nonetheless, there are methodological remedies. John Cleland (1996) and Manuel de la Puente (1993) both highlight collection methods that are in tune with every respondent, making use of local and culturally familiar terms to improve response. This is an important but not an easy path, as most accepted statistical and demographic methods require the use of standard measures, at least for age, dates and income.

Survey theory has suggested many ways of getting improved response which may also provide solutions for censuses: make it easy to say 'yes' and hard to say 'no'. Do this by engaging respondents' interest, by appropriate incentives, and by designing questionnaires that encourage response to at least some pertinent core questions (Groves and Couper, 1996; Luppes and Barnes, 1997; Kersten and Bethlehem, 1984; Hofmans, 1998).

John Cleland (1996) argues the cost-effectiveness of simple large-sample survey designs: "Many survey practitioners justify the use of exceedingly long and complex questionnaires by pointing out that the average length of the interview has relatively little effect on fieldwork costs. That is correct, but what is often overlooked is that highly complex survey designs perpetuate a dependence on foreign technical assistance, and that this element typically accounts for about half of the total survey costs."

Making a response compulsory is not a total solution. In Britain, the compulsory census is the occasion for mass acts of civil disobedience every ten years.

One can get knowledge of who has been missed, through their answers to previous enquiries, through links to administrative records, through special validation surveys, and through comparison with independent demographic estimates of the target population.

This knowledge is important if it leads to a less biased and more complete database, through weighting of enumerated records, or imputation of missed records (Chambers et al, 1998; Haslinger, 1991; Little and Rubin, 1987). These adjustments should be achieved before release of any tabulations and analyses, but this is only attempted for surveys. Plans for such unbiased population databases have been developed in the UK (various papers in Simpson, 1998) and in the USA for the 2000 round of censuses, but are not yet practically tested.

Adjustment for non-response tends to remain incomplete, because people who are very hard to count cannot be fully described by secondary 'adjustment variables'.

Administrative registers

Administrative registers that intend to cover the whole population are increasingly being suggested as an alternative or supplement to a population census (Vliegen and van der Stadt, 1988; Norway, 1998). But they tend to have the same biases, with similar consequences, as censuses and surveys (Bah, 1998, and the examples below). Their coverage must be subject to the same rigorous external validation.

Table 1 shows the bias in the electoral registration system intended to cover almost all households and all adults. It is used widely for sampling purposes in Britain. High rates of non-registration such as in London are entirely explained by the individual characteristics those not registered, shown in Table 1 (taken from Heady et al., 1996).

Table 1: Percentage of eligible adults not registered for elections in Britain, 1991 - CLICK HERE

Linking individual administrative records from more than one computerised database is intended to provide more detailed information than is available from surveys and censuses. Biases in each database may be reinforced because the common identifier required for linking may itself be imperfect. An investigation of poor academic achievement among pupils in Newcastle-on-Tyne, used addresses to link school pupils' registers of attainment, subsequent employment, and attitudes. Each register had been claimed to be complete, but those pupils whose records that could be linked were significantly different: their academic achievement was much better than those that were not linked, with the result that the project had 'lost' the very children it had intended to target as needing greater understanding (Table 2, taken from Coombes, 1999a, 1999b).

Table 2: Average academic achievement from records of Newcastle school children before and after linking with other datasets - CLICK HERE

An example

My example concerns incomplete birth registration in Tisma, a town of 10,000 residents in Nicaragua.

Without a birth certificate in Nicaragua, one cannot vote, get the school certificate that allows entrance to University, nor get a passport. In Tisma, the deputy mayor and local students made a census earlier this year and found 500 children without birth certificates. Most of these 500 children had not voted, nor finished schooling, nor had a passport. The deputy mayor - Nidia Olivas Trejos - negotiated with the Tisma Council and the Tisma Court to drop the prohibitive charges - 600 cordobas, about US$60 dollars for each child - and undertook all the paperwork required. The first new birth certificates have been handed over at a public ceremony.

This type of campaign is not unique, I am sure. But it shows several important characteristics:

  • Inclusion on the register, and all the rights that followed, had to be bought, prohibitively for the poorest families.
  • Inclusion on the register came to have great personal value for the individuals.
  • There was democratic value in a complete register - statistics are now more complete not only of births but of population size - and will be used to bring a more just allocation of government resources to all people in Tisma.
  • The difficulties in obtaining a compete count were overcome by a combination of bureaucratic contributions, inspired by voluntary community effort.

Conclusion

As statisticians and research workers, we have a responsibility for minimising missing data, by asking the right questions, by making data fully available, and by ensuring that analyses are unbiased by missing units. More data are not necessarily better data. Researchers need to communicate with public as well as with politicians in order to convince them of the necessity of methods that lessen bias (Richie, 1998).

If data has become biased by non-response - and it usually does - users expect the statisticians and research workers to highlight how this affects conclusions by users of the data.

We are seen as giving - independently of our employer - an assurance of validity, of completeness, of no bias, to the information that is released. Rarely can anyone else tell if that responsibility has been carried out well. The responsibility should be taken seriously.

ACKNOWLEDGEMENTS

Nittala Murthy, Adam Marton and Joris Nobel were very helpful in providing pertinent comments and literature. The work was supported by award R000236963 from the Economic and Social Research Council of the UK. The International Statistical Institute provided their handbooks of addresses, Danny Dorling suggested the title a long time ago, and Jay Ginn made many helpful editorial suggestions.

REFERENCES

Allen, S. (1998), Personal communication

Bah, S.M. (1998), The making, unmaking and remaking of a national but stratified vital statistics system in the Republic of South Africa. Internal paper, Pretoria: Central Statistical Office

Ben-Moshe, E. (1998), Personal communication

Bryant, B.E. and Dunn, W. (1995), Moving power and money: the politics of census taking, New York: New Strategist Publications

Chambers, R., Cruddas, M. and Jones, T. (1998), Small area estimation for a one number census - weighting vs. imputation, Chapter 14 in Simpson (1998)

Cleland, J. (1996), Demographic data collection in less developed countries 1946-1996, Population Studies, 50, pp. 433-450

Coeffic, N. (1993), L'enquete de mesure du degre d'exhaustivite due recensement de 1990, Chapter 2.3 in 'Le recensement de la population 1990, innovations methodologiques', Paris: INSEE

Collado, M. (1998), Personal communication

Coombes, M. (1999a), 'Research potential of administrative datasets', chapter in Research for Policy, Proceedings of the Annual Conference of the Local Authorities Research and Intelligence Association, LARIA: Wokingham

Coombes, M. (1999b), Personal communication

de Heer, W. (1996), International response trends: developments and results of an international survey, Statistics Netherlands, PO Box 4481, 6401 CZ Heelen, The Netherlands

de la Puente (1993), Why are people missed or erroneously included by the census: a summary from ethnographic coverage reports, pages 29-66 in 1993 Research conference on undercounted ethnic populations, Washington DC: Bureau of the Census

de Lucas, A. (1992), Actitudes y representaciones sociales de la población de la Comunidad de Madrid en relación con los Censos de Población y Vivienda de 1991, Comunidad de Madrid, Consejería de Economía Departamento de Estadística, Calle Principe de Vergara 132, 28002 Madrid, Spain

Dorling, D. and Simpson, S. (eds.) (1999), Statistics in Society, London: Arnold

Gaudier, M. (1993), Poverty, inequality, exclusion: New approaches to theory and practic,. Geneva: International Labour Organisation

Haslinger, Alois (1991), Struktur, Folgen und Behandlung von Antwortausfällen im Mikrozensus, Statistische Nachrichten, 46, 538:550

Havasi, E. and Marton, A. (1997), Nonresponse in the 1996 Income Survey, Proceedings of the 8th international workshop on household survey non-response

Heady, P., Bruce, S., Freeth, S. and Smith, S. (1996), 'The coverage of the electoral register', pp. 189-206 in I. Mclean and D. Butler (eds.), Fixing the boundaries: defining and redefining single-member electoral districts, Aldershot: Dartmouth

Hofmans, M. (1998), 'Innovative weighting in POLS. Making use of core questions', Netherlands Official Statistics, 13, pp. 20-29

Irvine, J., Miles, I., and Evans, J. (1979), Demystifying social statistics, London: Pluto Press

Jarque, C.M. (1998), 'Statistical developments in Mexico', IASS/IAOS conference on statistics for economic and social development, 1-4 September, Aguascalientes, Mexico

Kersten, H.M.P. and Bethlehem, J.G. (1984), 'Exploring and reducing the nonresponse bias by asking the basic question', Statistical Journal of the United Nations, 2, pp. 369-380

Komsomolskaya Pravda (1998), 'Goskomstat corruption case detailed, 16 June 1998', Translated in Johnson's Russian list

Lillard, L., Smith, J.P. and Welch, F. (1986), 'What do we really know about wages? The importance of nonreporting and census imputation', Journal of Political Economy, 94(3), pp.489-506

Little, R. and Rubin, D. (1987), Statistical analysis with missing data, New York: Wiley

Luppes, M. and Barnes, B. (1997), 'On the use of incentives: an overview of policies in several countries', Netherlands Official Statistics, 12, pp. 26-39

Middleton, E. and Simpson, S. (1998), Simulation of the impact of non-response on census applications, In Simpson (1998)

Mitchell, B. (1995), International Historical Statistics: 1750-1988, 3 volumes, second revised edition, New York: Stockton

Murthy, N. (1998), Personal communication

National Research Council (1981), Data for the estimation of fertility and mortality, Washington D.C.: National Academy Press

National Research Council (1995), Modernizing the US Census, Washington DC: National Academy Press

Nissell, M. (1995), 'Social Trends and social change', Journal of the Royal Statistical Society series A, 158(3): pp. 491-504

Norway (1998), 'Reducing the costs of censuses in Norway through use of administrative registers', International Statistical Review, 66(2)

Richie, M. (1998), 'Communication, politics, and the census', Radical Statistics 69, (Autumn), pp. 47-51

Simpson (ed.) (1998), Proceedings of the One Number Census Research Workshop, Leeds, May 12-13 1998, Centre for Census and Survey Research, University of Manchester, UK, M13 9PL

Simpson, S. and Middleton, E. (1997), Who is missed by a national census? A review of empirical results from Australia, Britain, Canada, and the USA, CCSR working paper 2, Centre for Census and Survey Research, University of Manchester, UK, M13 9PL

Vehova, V. and Zaletel, M. (1996), The matching project in Slovenia: who are the nonrespondents? Proceedings of the 6th International Workshop on Household Survey Non-response pp. 193-203, Helsinki: Statistics Finland

Tauasosi, T. (1998), Personal communication

Townsend, P. (1998), 'Ending world poverty in the 21st century', Radical Statistics 68, (Spring), pp. 5-14

van Tuinen, H. (1995), 'Social indicators, social surveys and integration of social statistics', Netherlands Official Statistics, 3, pp. 5-22

Vliegen, J.M. and van der Stadt, H. (1988), 'Is a census still necessary? Experiences and alternatives', Netherlands Official Statistics 3, pp. 27-34

Ludi Simpson

41 Park Crescent
Bradford BD3 0JZ
E-mail: ludi@man.ac.uk

 

Journal 069 Index Top of page