2.1 Data cleaning and processing
The EHINZ team does not take responsibility for the quality of data we acquire from other agencies. However, all team members are required to conduct basic checks for quality, accuracy, and completeness of all data we receive, analyse, interpret, and disseminate.
Examples of this include checking:
- if the variables received are what was requested
- if the variables have sensible values and investigate any missing values
- against previous extracts, if available, to see if values for previous years are the same/similar, with justification for any differences
- that the sub-categories add up to the totals
Some data will have specific filters and/or exclusion criteria that need to be adhered to during analysis, so check any documentation (eg, data dictionary, metadata, or methods section) for details.
2.1.1 National Minimum Dataset (NMDS)
The National Minimum Dataset (NMDS) is a complex dataset requiring filters to be considered to reduce overcounting of hospitalisation events. The set of filters we have available to use include:
- transfers (within or between hospitals)
- Emergency Department short stays
- day cases
- overseas patients
- readmissions for the same condition
- waiting list or elective cases.
APPENDIX 1 is a guide to applying these filters. This appendix also contributed to a guide for analysing hospital discharge data in the StatsNZ Integrated Data Infrastructure (IDI) within the Virtual Health Information Network (VHIN):
2.2 Demographic groupings
Where possible, we report by standard demographic groupings (ie, Total NZ, age, sex, ethnicity, NZDep and geographic area groupings), and set up variables in the dataset accordingly.
2.2.1 Total NZ
We report environmental exposures and health outcomes for total New Zealand.
We analyse data and output our information by age groups consistent with the Ministry of Health’s output as much as possible. Generally, it will be 5-year bands for children (0–14 years), and 10-year bands for adults (15+ years). Thus, the age groups (in years) are:
- 0–4, 5–9, 10–14, 15–24, 25–34, 35–44, 45–54, 55–64, 65–74, 75–84, 85+
There are variations to this grouping for some work, for example, infants (0–12 months).
Which grouping to use can depend on the data we acquire and the purpose of the analysis. We state the age groups used and ensure that the same approach is used for both the numerator and denominator data.
We use the term ‘sex’ (and not ‘gender’) and currently analyse the data by males and females only. Sex refers to a person’s biological sexual characteristic, while gender is a person’s internal sense of identity.
For more information, see the following link from StatsNZ:
Ethnicity is self-perceived, and a person can affiliate with more than one ethnic group.
By default, we use a modified Level 1 classification from StatsNZ in grouping ethnicities, which is the Level 0 classification from the MoH (ie, ethnic group categories are: Māori, Pacific Peoples, Asian, and European/Other).
For more information, see the following link from MoH:
Two main approaches are used to analyse ethnicity data. We use ‘Prioritised’ ethnic groups and also ‘Total response’ ethnic groups. The approach we use can depend on the data that is available.
Regardless of the approach chosen, it is important to clearly state the approach used and to apply this to both the numerator and denominator data.
- Every person in the dataset is assigned to ONE ethnic group.
For example, if a person selects Māori and European, then we categorise them as Māori.
- Priority of ethnic group categorisation: Māori, Pacific Peoples, Asian, European/Other.
- Every person in the dataset is counted in EACH ethnic group they report.
- Used when comparing Māori vs non-Māori, or Pacific Peoples vs non-Pacific Peoples
- Can compare between ethnic groups.
- The sum of the ethnic group populations will equal the total population of New Zealand.
- Allows better representation for those who identify with multiple ethnic groups.
- Everyone is assigned to only one ethnic group, even if they identify with more than one ethnic group
- Cannot compare between ethnic groups.
- The sum of the ethnic group populations will exceed the total population of New Zealand
Please note, when using ethnicity data from Census 2018, careful consideration should be taken when cross-tabulating with other demographic groups. Stats NZ has noted that the 2018 Census had a lower-than-expected response rate resulting in the introduction of new methods to produce the dataset, including using data from alternative sources. Stats NZ and the 2018 Census External Data Quality Panel (EDQP) have produced a rating system to help the users understand the quality-related issues and impacts of the 2018 Census dataset. Further information about the Stats NZ and EDQP documentation can be found in https://www.stats.govt.nz/2018-census/data-quality-for-2018-census
2.2.5 New Zealand Index of Deprivation (NZDep)
The University of Otago, Wellington (UOW) calculates NZDep using a set of variables from the Census to estimate the level of area deprivation for people in each small area geography in New Zealand. The calculations for the current NZDep (ie, NZDep2018) are based on Statistical Area 1 (SA1), whereas previous iterations of the NZDep were based on meshblocks. We use the NZDep ordinal scale, which divides New Zealand into tenths (deciles) based on the distribution of a continuous score.
For details about how the NZDep is derived and its recommended usage, see the following link from the University of Otago:
Decile 1 represents the areas with the least deprived NZDep scores, and Decile 10 represents the areas with the most deprived NZDep scores. Deciles are grouped in pairs, with Quintile 1 consisting of Deciles 1 and 2, and Quintile 5 consisting of Deciles 9 and 10.
According to the UOW report, we should refer to ‘areas that have the most deprived NZDep scores’ rather than ‘the most deprived areas’.
Current years of data: NZDep2001, NZDep2006, NZDep2013, NZDep2018.
We use the year closest to the year of the outcome or exposure being analysed.
Please note, health outcomes or environmental exposures by NZDep can be reported over time. However, it is not advisable to compare actual NZDep scores, deciles or quintiles for an area over time.
For further information about NZDep refer to the University of Otago website:
Please note, when using NZDep data based on Census 2018, careful consideration should be taken when cross-tabulating with other demographic groups. Stats NZ has noted that the 2018 Census had a lower-than-expected response rate resulting in the introduction of new methods to produce the dataset, including using data from alternative sources. Stats NZ and the 2018 Census External Data Quality Panel (EDQP) have produced a rating system to help the users understand the quality-related issues and impacts of the 2018 Census dataset. Further information about the Stats NZ and EDQP documentation can be found in https://www.stats.govt.nz/2018-census/data-quality-for-2018-census
2.2.6 Geographic area groupings
Results can be produced at varying geographical levels. Geographic area groupings are based on the document ‘Statistical standard for geographic areas 2018’ published by Stats NZ: http://archive.stats.govt.nz/methods/classifications-and-standards/classification-related-stats-standards/geographic-areas.aspx
Please note, a summary document providing further background information, maps outlining different area boundaries and how they align with each other can be found in https://www.stats.govt.nz/methods/statistical-standard-for-geographic-areas-2018.
220.127.116.11 Administrative, statistical and electoral area groupings
The Statistical Standard for Geographic Areas 2018 divides geographic areas into two main groups.
Statistical geographies are defined and maintained by Stats NZ and include four levels:
- Meshblock (MB)
- Statistical Area 1 (SA1)
- Statistical Area 2 (SA2), which replaced Area Units (AU) in 2018
- Urban Rural (UR) and Urban Rural Indicator (IUR), which replaced the geography of Urban Area (UA) in 2018.
Each of the four levels is nested within the level below, and boundaries align mostly with territorial authority and regional council geographies.
Administrative and electoral boundaries were established in legislation under the Local Government Act 2002 and the Electoral Act 1993. The main areas in use by the EHINZ team are:
- Regional Council (REGC)
- Territorial Authority (TA)
- Community Board/Local Board (CB/LB)
- District Health Board (DHB)
Please note, due to changes in geographical boundaries, some data will need to be adjusted to align with these changes over time.
2.3 Other subgroups
Apart from demographic groupings (ie age, sex, ethnicity, NZDep, geographic area groupings, and Urban Rural), some indicators require data to be disaggregated by other subgroups. Some examples include:
- total energy consumed, by fuel type and sector: data is broken down by the type of fuel used (eg, oil) and by the sector consuming the fuel (eg, domestic transport sector)
- number and density of livestock in New Zealand, by type: data is broken down by the type of livestock (eg, dairy cattle)
- number of motor vehicles in the fleet, by vehicle type and fuel type: data is broken down by the type of vehicle (eg, truck) and fuel type (eg, diesel)
- monitoring sites exceeding the national environmental standard (one-hour average) for nitrogen dioxide (and similar indicators monitoring other air pollutants): data is broken by monitoring stations for the respective air pollutant.
Key components to monitoring the environmental health of New Zealand are to analyse demographic and other subgroups of New Zealand, compare against New Zealand, and observe trends over time.
To do this, the EHINZ team uses a selection of analytical techniques to calculate common statistical estimates including counts, proportions, means, rates, and ratios. Our analytical methodologies draw heavily from the United Kingdom Association of Public Health Observatories (APHO) documentation and spreadsheet. Their work provides a toolkit to calculate proportions, means, crude rates and age-standardised rates based on the Poisson and Binomial distribution, along with their corresponding confidence intervals.
The APHO documentation (particularly, the APHO Technical Briefing 3 – Commonly used public health statistics and their confidence intervals) and analytical tool (specifically, the Tool for calculating common public health statistics and their confidence intervals) can be found on https://fingertips.phe.org.uk/profile/guidance
Age-standardisation is a key analytical technique used in our analyses, and a detailed explanation about this is found on https://www.health.govt.nz/system/files/documents/publications/standardising-rates-disease_0.pdf
APPENDIX 2 is a summary table of the common output estimates and statistical terms we use, including confidence intervals and statistical significance. A description of the key statistical tests we use are also summarised here.