Received: 17 July 2009
Accepted: 4 October 2009
There is an identified need for the collection of ethnicity data in the healthcare setting. Accurate data on ethnicity are essential for informing policy makers, funders and public health experts about the incidence, prevalence and outcomes of specific conditions in population subgroups. There is emerging evidence that some ethnic groups are associated with an increased incidence of certain cancers, and disparities in access to services have been documented. Government initiatives are in place to collect ethnicity data in the healthcare setting, but the accuracy of the data needs to be validated. Cancer Research UK commissioned the Cancer Ethnicity (CanEth) project to gather robust evidence and identify solutions to improve the collection of ethnicity data for cancer. The project set out to review current literature focusing on methods, interventions and barriers addressing the collection of ethnicity data. The review identified a paucity of published evidence on ethnicity data collection. Many clinical articles used ethnicity data, but few discussed the methodology of data collection. In general, however, self-reported ethnicity is recognised as the best method of data collection, and is preferable to observer assessment. Training is needed to raise awareness of the importance of ethnicity data and its use to facilitate the reduction of inequalities .
black and minority ethnic groups, data collection, ethnicity, monitoring, profiling
The reduction of cancer inequalities was a key feature of the Cancer Reform Strategy published in 2007, which proposed to improve cancer outcomes and uptake of services by 2012, including those inequalities observed in black and minority ethnic (BME) populations (Department of Health, 2007). In cancer, ethnicity data collection and monitoring are particularly important because ethnic minority groups have been demonstrated to have later presentation, leading to poor survival (Smith et al, 1999; White, 2002). Also, some ethnic minority groups tend to demonstrate more risky behaviour. For example, smoking rates were reported to be highest in Bangladeshi males (44%), followed by Irish males (39%), compared with 27% in the general population, whereas Bangladeshi women are more likely to chew tobacco (26%) than to smoke cigarettes (White, 2002). Reports suggested that the incidence of both breast and colorectal cancer was lower in the South Asian population. However, incidence rates are increasing over time (Smith et al, 2003; Farooq and Coleman, 2005). With regard to other disease areas, South Asians in the UK are 50% more likely to die prematurely from coronary heart disease than the general UK population, and males and females of Pakistani and Bangladeshi origin are six times more likely than the general population to have diabetes (Townsend et al, 1988; Commission for Racial Equality, 2008).
The 2001 census classified 4.6 million people (7.9%) in the UK as belonging to a non-white ethnic group, with over 50% of these classified as Asian or British Asian (Office for National Statistics, 2001). This is an increase compared with the 5.5% of the population not defined as white in the 1991 census. The 2001 census identified 55% of the mixed race category as being 16 years of age or younger. For epigenetic modelling, a more detailed definition of ‘mixed race’ is required, such as mothers’, fathers’ and grandparents’ ethnicity and geographical origins/ancestry. To improve public services appropriate to the needs of BME patients, there is a need to break down ethnicity further to identify language, religion and culture, thus allowing more accurate information to be collected and resources to be optimally targeted.
In the UK, the ethnicity debate has often focused on the utility and classification of ethnicity data (Johnson, 1998, 2001, 2006; White, 2002; London Health Observatory, 2003; Greater London Authority, 2005). The quality of ethnicity data recording has been variable. Attempts to improve the completeness and quality require dedication and commitment (Liverpool John Moores University, 2000). Reports focusing on ethnicity tend to use the standard census categories, but frequently show significant numbers of cases reported as ‘not known’ or ‘did not answer question’, and consequently the impact and value of such work are limited (White, 2002; Greater London Authority, 2005). Recording of additional dimensions of diversity, such as religion or preferred language, is infrequent and often poorly conducted.
In general, collection of ethnicity data has long been recognised as poor in the UK, especially in primary care, with regard to completeness and accuracy (Pringle and Rothera, 1996; Kumarapeli et al, 2006; Jones and Kai, 2007). There are many reasons for the lack of routinely collected ethnicity data. These include the difficulty of an accurate classification, awareness of sensitivities when asking for these data, lack of motivation to collect or provide data, unwillingness or inability (due to language barriers) of individuals to provide information, and a lack of understanding of how such data can or will be used. Reports on health inequalities and outcomes across ethnic groups emphasise the need to overcome these barriers and record ethnicity accurately. The danger is that current policies are based on inaccurate data and, as such, may lead to inappropriate distribution of resources and services (White, 2002; London Health Observatory, 2003; Greater London Authority, 2005).
In 1995 it became UK government policy to collect ethnicity data in secondary care settings through Hospital Episode Statistics (HES). HES data collection has improved over time. For example, in London, 52% of records in 19961997 had incomplete data, whereas by 20012002 this figure had fallen to 35% (London Health Observatory, 2003).
In 2001–2002, an attempt was made to increase ethnicity profiling in primary care. However, at this time the work involved and the related costs were significant deterring factors (Jones and Kai, 2007). Recently, some primary care trusts have invested in the collection of ethnicity data, and these initiatives are supported by the incorporation of ethnicity into the Quality and Outcomes Framework for GPs (although restricted to new patients and only awarded one point) (Race for Health, 2007). Monitoring goals set for London for 2003–2006 by the Department of Health expected all GP practices and other primary care providers to record valid ethnicity codes for 75% of patients by 2005, and expected this figure to reach 95% by March 2006 (London Health Observatory, 2003). The ‘Professionals Responding to Cancer in Ethnic Diversity’ (PROCEED) project team provided training in competence and cultural awareness for healthcare professionals who were involved in cancer care at primary care level. The issues explored included cancer and ethnic diversity, language and communication, and culture and cancer (Cancer Research UK, 2006).
In 2005, the NHS produced a guide to ethnic monitoring in the NHS and social care, with several examples of good practice (Department of Health, 2005). There is limited information on the uptake of these guidelines and their practical applicability. Within the cancer setting, family history, ethnicity, social class, material deprivation, lack of access to services and subsequent delay times have all been adversely linked to outcome (i.e. survival) (Townsend et al, 1988; White, 2002; Farooq and Coleman, 2005; Woods et al, 2006). There is an urgent need for evidence on how ethnic data collection might be improved for cancer statistics, what mechanisms might be implemented for data quality validation checks, and a strategy for optimal use of this data in order to encourage improved collection.
This paper is the first part of a project commissioned by Cancer Research UK to assess ethnicity data collection for statistics of cancer incidence, management, mortality and survival in the UK. The report also includes a survey of healthcare professionals’ perceptions of ethnicity data collection, focus groups of consumers’ perceptions and willingness to provide ethnicity data in healthcare, and a validation exercise to assess the completeness and accuracy of ethnicity data in a feasibility study of GP practices (Iqbal et al, 2008).
This paper focuses on one part of the project, namely a systematic review undertaken to gather robust evidence and identify clear solutions and recommendations to improve the collection of ethnicity data for health statistics in the UK. This information is essential in order to obtain a better understanding of the uptake of services and health outcomes, to monitor trends, to target interventions and allocate resources to better meet the needs of BME groups, and to tackle health inequalities. The review examined the published literature discussing methods, interventions and barriers with regard to the collection of ethnicity data in primary and secondary care. It also included a separate search of key websites to identify relevant ‘grey literature’ such as government reports and other unpublished material which cannot easily be found via conventional database searches.
The databases used for this review were identified in the early stages of the project through consultation with a team of experts, including a specialist information scientist working for the Centre for Evidence in Ethnicity, Health and Diversity (CEEHD). The searches encompassed five bibliographic databases, namely Embase, Psychlit, Medline, Psychinfo and Cinahl. The three key search areas were ethnicity, data collection OR data monitoring AND cancer or other chronic or long-term diseases such as stroke, diabetes and coronary heart disease (see Table 1). The search of published literature was split into two sections. The first search was limited to 2000–2007 with the aim of identifying recent literature. The second search used the same terms but was extended to 1990–1999 to capture literature before and after the National Institute of Health Revitalization Act, which was passed in the USA in 1993 and prompted interest in reporting by ethnic group. The review was conducted in three stages, namely title, abstract and article review. Abstracts were reviewed by the researcher and by the co-authors as well as by members of an independent advisory board of experts.
Grey literature searches were conducted using the keywords data collection OR data monitoring AND ethnic OR ethnicity. The searches were performed in Google and Google Scholar. Only the first 50 pages were scanned, due to the huge volume of results. In addition, extensive searches were carried out on key websites such as the Specialist Library for Ethnicity and Health, the London Health Observatory, the Office for National Statistics and the Department of Health.
The findings are presented in sections based on seven themes which emerged during the course of the review as shown in Box 1.
The majority of the relevant published articles were from the USA (68%). However, the majority of guidelines found in the grey literature search were UK based (63%). Of the 35 articles included in the review, 19 articles (54%) were identified from published literature and a further 16 articles (46%) from grey literature. In total, 29 (83%) of the relevant documents were interested in all ethnic groups,with six (17%) focusing on particular groups; 26% of the relevant literature consisted of either guidelines, training materials or toolkits.
The review of published literature provided a total of 2404 ‘hits’, of which 720 were for the period 1990– 1999 and 1684 were for the period 2000–2007. Upon review of the 2404 titles, only 322 seemed to suggest that they involved the methodology of either collecting or monitoring ethnicity data. A full review of these 322 abstracts revealed only 26 which potentially fulfilled our criteria (see Figure 1). The main reason for rejection (57% of cases) was that the paper was concerned with the use of ethnicity data rather than the methods for collection of such data. The full text of the 26 potential articles was reviewed, and only 19 of these articles included information about data collection or monitoring. One of the potentially relevant papers is included based on the abstract only, as the full paper is unavailable (Chattar-Cora et al, 2000) (see Figure 1 and Table 2).
Searches on key websites and Google and Google Scholar identified a wealth of information, with 53 reports being identified as possibly associated with ethnic data collection or monitoring. The main reasons for rejection were that the reports contained only opinion (i.e. discussion of the need for ethnicity data collection) or used ethnicity data for reporting outcomes. Of the 53 reports that were reviewed, 16 were included in this review (see Table 3).
Six reports presented best practice evidence for ethnicity data collection and monitoring (Commission for Racial Equality, 2002; Department of Health, 2005; Health Scotland, 2005; Race forHealth, 2006;Regenstein and Sickler, 2006; Health Research and Education Trust, 2007). Examples of best practice in the UK are given in the report by the Department of Health (2005). Key reports where ethnicity data collection has been successful due to adequate resources, awareness and training (Race for Health, 2006; Regenstein and Sickler, 2006) also demonstrated the need to have a ‘use’ for the data in order to improve collection.
Recommendations for improving ethnicity data collection are largely concerned with standardisation of the method of collection, point of collection, ethnicity categories, data coding and storage, and lastly standardised responses to the patients’ frequently asked questions (Hasnain-Wynia et al, 2004; Ford and Kelly, 2005; Weinick et al, 2007). The UK Department of Health has implemented policy change within the primary and secondary care settings. The impact of accurate ethnicity data collection has not been fully realised, as there is still a long way to go before the data are complete and reliable (Department of Health, 2001; Hasnain-Wynia et al, 2004).
A United Nations report identified a total of 107 ethnicity questions asked by 95 countries in the census (United Nations Statistics Division, 2003). Only 12% of countries that collected ethnicity data had categories for ‘mixed identities’ or allowed multiple box selection. Other international guidelines indicate that the gold standard categories used within a country may be expanded so long as they can be concatenated back for national reporting purposes (Commission for Racial Equality, 2002; Race for Health, 2006; Weinick et al, 2007). There are also inconsistencies with the data types being used. These include coded tick box categories with and without boxes for free text, closed questions with yes/no responses and open questions for free text allowing people to describe themselves in their own words (United Nations Statistics Division, 2003).
The UK gold standard ethnicity categories are taken from the 2001 census ethnicity question which consists of 16+1 categories (‘+1’ being the code for ‘not stated’). The Commission for Racial Equality (CRE) report and the Department of Health guide to ethnic monitoring both state the importance of not offering patients this option (Commission for Racial Equality, 2002; Department of Health, 2005).
The UK Department of Health guidelines encourage the additional collection of data on religion, diet, language and the need for an interpreter (Department of Health, 2005). These additional indicators of ethnicity should be collected especially if they are relevant at a local level. The Office for National Statistics (ONS) recommends that data on nationality are also collected for planning and resource purposes (Office for National Statistics, 2003). Responses should be re-ordered depending on where the question is being asked (e.g. in England, ‘English’ should be at the top of the list). This ordering to emphasise groups of policy importance is also practised in other countries, such as New Zealand, where ‘Maoris’ is at the top of the coding list (Gardi, 2003).
The Individual Patient Registration Profile (IPRP) used by Lambeth Primary Care Trust collects data on ‘religion’, ‘language’ and ‘need for an interpreter’ in addition to ‘self-reported ethnicity’ (Race for Health, 2006). The ethnicity categories have been expanded in line with the make-up of the local population, but can be concatenated to the census categories. The data are stored on a dedicated central database which can link the IPRP data to research projects. Central Liverpool NHS Primary Care Trust has also carried out patient profiling by collecting detailed ethnicity data, including ‘spoken language’ and ‘reading language’ (Liverpool John Moores University, 2000). However, ‘country of birth’, which has been collected since 1841, is no longer deemed a reliable indicator of ethnic origin, as at least 50% of members of ethnic minorities are born in the UK (Gill et al, 2007).
Self-reported ethnicity is the gold standard, and the reasons for this are discussed in many good practice guidelines and papers (Commission for Racial Equality, 2002; Department of Health, 2005; Regenstein and Sickler, 2006). If healthcare professionals determine ethnicity by observation, this can lead to stereotyping by skin colour and name, so it should only be used where self-reporting is not possible. In the USA the Health Research and Educational Trust toolkit and Hasnain-Wynia et al. (2004) illustrate how staff should ask for these data, and emphasise the need for selfreporting (Hasnain-Wynia and Baker, 2006; Health Research and Educational Trust, 2007). Surveys conducted by the Robert Wood Johnson group showed that 61% of respondents usually asked the patient to self-report, but 25% filled in the ethnicity themselves on the basis of observation (Regenstein and Sickler, 2006). They felt that this method was easier for both them and the patient as it avoided any discomfort. They also felt that it was accurate, as they believed they knew their local population. It would be informative to separate the occasions when staff fail to ask from those when patients do not wish to provide the data; these areas will need to be tackled independently, as they stem from different problems (Department of Health, 2005). The method of collection should also be recorded alongside the data (i.e. self-reporting or observation), otherwise other important biases could occur if assumptions are made about the reporting method (Commission for Racial Equality, 2002; Buescher et al, 2005). Sugarman and Lawson (1993) demonstrated that racial disparity varied according to the method of collection, and the incidence of renal disease in American Indians/Alaska Natives increased from 268 per million to 312 per million after corrections to the coding.
Other methods of collection could include the use of name recognition software. Patients’ notes were used to successfully identify most patients in one study, demonstrating that names can be used with some precision when no other data are available (Chattar- Cora et al, 2000). It has been shown that name recognition software used in conjunction with other indicators such as country of birth results in increased accuracy (Sheth et al, 1997; Swallen et al, 1997; Warnakulasuriya et al, 1999).
The main barrier to ethnicity data collection is staff members’ lack of knowledge about the importance and use of the data. Site visits to six consortium member hospitals in the USA and a nationwide survey of 1000 hospitals found that30%of respondents reported problems with or barriers to collecting ethnicity data (Hasnain-Wynia et al, 2004). The barriers reported were similar to those found in the Robert Wood Johnson report (Regenstein and Sickler, 2006), the most important being the reluctance of staff to ask for ethnicity data, due to fear of offending the patient or encountering resistance. Confusion about ethnicity categories, lack of a demonstrated need to collect the data, limitations of databases with regard to capturing this type of data, lack of resources, and lack of agreement among executive leaders about the need to collect these data were also reported (Hasnain- Wynia et al, 2004).
One of the main barriers to data collection is patients’ perceptions. Baker reported that 46% of patients were concerned that the data would be used to discriminate against them (Baker et al, 2007). Patients would be more willing to provide data if the reasons why the data were being collected were explained to them, and healthcare professionals should be comfortable asking for these data (Baker et al, 2005).
All of the best practice guidelines recommended that the main intervention required for completeness and accuracy of ethnicity data collection was staff training, followed by adequate resources for data collection and use (Commission for Racial Equality, 2002; Department of Health, 2005; Health Scotland, 2005; Race for Health, 2006; Regenstein and Sickler, 2006; Health Research and Educational Trust, 2007). The 2005 NHS guidelines state that staff training should be tailored to local need and should explain why ethnic monitoring is important, how to collect the data and what they will be used for. Local community groups could be asked to comment on the content of the training packs. All staff who may be involved in collecting ethnicity data, writing reports, or analysing or making decisions based on the data need to attend training. Training needs may differ from one group to another (Department of Health, 2005).
In the USA, the Health Research and Educational Trust toolkit provides a free national training package for the collection of ethnicity data (Health Research and Education Trust, 2007). It is written for all levels of healthcare workers, including chief executive officers, clinicians, registration staff and database managers, aswell as for patients, enabling users to select the information package that is most relevant to them. The toolkit explains the need for ethnicity data collection, the need for standardisation, how to ask the questions, training exercises and how the data are or could be used. The resources provided include training presentations, definitions of key terms, and a reference booklet for staff.
Apart from the best practice guidelines in the UK, the most comprehensive training package is the Ethnic Monitoring Tool developed by NHS Scotland (Health Scotland, 2005). This is aimed at NHS Scotland staff and provides information on why it is important to carry out ethnic monitoring, who is involved, and what needs to be put in place. Training materials can be downloaded and modified according to local needs. Training-for-trainers notes and role-play scenarios are also provided. The Lambeth Primary Care Trust project offers 1.5 days of training for staff, computer templates are provided, and resources are made available to mail a questionnaire to existing patients as well as collecting ethnicity data for those newly registered (Race for Health, 2006).
The importance of staff training was discussed in the Robert Wood Johnson Report, with different methods used across three hospitals. The training was delivered as part of the induction programme to all new staff in the first hospital, but was provided to all staff in the second hospital. The third hospital subjected members of staff working in the registration areas to quality review. Managers are able to identify staff who record a large number of unknowns or blanks, and implement training to address these problems (Regenstein and Sickler, 2006).
Completeness of ethnicity data is an ongoing problem. Reports based on incomplete or poor-quality data can provide misleading results. Many studies have compared self-reported data with official statistics and found inaccuracies (see, for example, Frost et al, 1994; Kelly et al, 1996; Buescher et al, 2005). It is important to have better data quality based on self-reported data. Ethnicity data were assessed in 376 recently diagnosed patients, and the findings showed that medical records are closely linked to self-defined ethnicity (Gotay and Holup, 2004).
Incompleteness of ethnicity data is a major problem for UK cancer registration, as registries depend on third parties to provide these data. Jack et al (2006) reported that ethnicity was recorded for only 23% of registry data, compared with 81% of HES data, and that linkage of records would be helpful to reduce duplication of work. In the USA, a Surveillance, Epidemiology and End Results (SEER) programme initiative to assess the completeness of data on country of birth reported that only 67% had recorded data, with completeness varying according to ethnic group, which suggests that there was bias in collection (Lin et al, 2001). Therefore country of birth should be used with caution for surveillance and reporting purposes.
The Centers for Disease Control (CDC) observed no improvement in race data collection between 1994 and 1997 (Centers for Disease Control and Prevention, 1999). However, an improvement has been seen inUK ethnicity data collection in secondary care since its inception in 1995 (London Health Observatory, 2003; Hospital Episode Statistics online, 2004). The importance of data collection is being recognised, but there is a long way to go before databases hold complete and self-validated ethnicity data. The Lambeth Primary Care Trust project demonstrates that, with dedicated resources, training and monitoring, improvements can be made and awareness increased.
This review has shown a need to increase awareness about the importance of routinely collecting ethnicity data. Ideally, ethnicity should be collected as mandatory at the GP reception level as a self-reported field which is subsequently validated by discussion with the GP, with an opt-out ‘not stated’ option for those patients who refuse to provide their ethnicity when asked to do so. It is well known that non-Englishspeaking patients will often register with a samelanguage- speaking GP, thus making this an ideal setting for self-reported data collection and validation for those members of ethnic minorities with language barriers. Data collection through the GP for all newly registered patients, as well as self-reported ethnicity for existing patients, may help to improve ethnicity data collection. Ethnicity data can also be collected at the first hospital visit. However, ideally databases could be linked between primary and secondary care systems so that demographic data are collected once only, with validation thereafter. Olatokunbo and Bhopal (2000) showed successful collection of ethnicity data in a primary care feasibility study, and also demonstrated the ease with which ethnicity could be included on hospital referral letters by means of an automated field. Linkage of ethnicity data from the UK census with health databases has also been demonstrated to be tangible in a retrospective cohort study that explored variations in myocardial infarction in South Asians (Fischbacher et al, 2007).
Ethnicity has been an optional data item in Cancer Registry datasets since 1993, and has been poorly recorded, with many patients coded as ‘not known.’ Incomplete data, conflicting data and lack of validation demonstrate the limited progress towards achieving a national policy for collecting ethnicity data. At the cancer registration level, identification of high-risk groups can only be based on the current data collected. If these data are not available, poorly collected or remain unvalidated, subsequent reports will be unreliable. It is also important for collected data to be used when reporting outcome measures such as access to healthcare and uptake of services, and to feed into policies designed to tackle inequalities (Raleigh, 2008). Use of these data in such reports is needed to demonstrate the importance of collection to both patients and healthcare professionals.
Aspinall (2009) predicts increased complexity as categories for collecting ethnicity data are expanded in order to better capture the increasingly diverse population of the UK. This will include the addition of new items, such as ‘national identity’, which aim to further capture the multi-dimensionality of ethnicity. These changes will lead to increasing difficulties in the analysis of these data, but will allow the identification of groups with more than one identity (e.g. British Muslims), which has not been possible in the past (Aspinall, 2009).
Projects such as PROCEED (Cancer Research UK, 2006) aim to provide training for GPs and hospital staff about engaging with ethnic minorities and cultural awareness. Other training, such as the NHS Scotland toolkit (Health Scotland, 2005) and the Department of Health training that was developed in conjunction with the 2005 guidelines, offers resources which can be used to raise awareness and improve the quality and completeness of ethnic data collection.
Some areas where initiatives have been assertively put in place (e.g. Lambeth Primary Care Trust, the Princes Park Health Centre and selected NHS boards in Scotland) have realised a significant improvement in data completeness and quality (Liverpool JohnMoores University, 2000; Race for Health, 2006; Information Services Division Scotland, 2009). Other areas where there is a low population of ethnic minorities, and where ethnic diversity is not deemed to be locally significant, should still be actively encouraged to collect and report these data in order to enable policy makers to determine high-risk groups and inequalities at a national level. It is imperative that the current levels of national awareness and motivation with regard to the importance of ethnic data collection are increased, otherwise we shall be unable to adequately tackle health inequalities for these ethnic minority patients.
We would like to thank Diane Clay, InformationOfficer for the Centre for Evidence in Ethnicity, Health and Diversity (CEEHD), who carried out the electronic searches, and the Advisory Board for their helpful comments on the interim study results. This work was commissioned by Cancer Research UK, and we would like to thank Vanessa Gordon-Dseagu, Catherine Foot and Ruth Yates for their support throughout this project.
This study was approved by South Birmingham Research Ethics Committee