Documents 2010/09

A quality study linking survey data and register data

Evaluating employment classification

Individual data on employment are used for a wide range of official statistics and research purposes. Statistics Norway collects data from both sample survey and administrative registers for this purpose. Especially when making detailed tables or measuring labour market flows, it is important that each unit have a correct classification of the employment status. This report provides an assessment of the employment data quality, as well as some discussion on theory and method. The analyses have been performed with data that are already collected, and represents a cost-effective alternative compared to quality studies that relies on extra data collection.

We present a quality assessment of employment data sources that are used for official statistics. Specifically, we want to evaluate the employment classification at the micro level, and link survey- and register-based employment data at the individual level. We do not assume that one of the data sources has all the true answers. Instead, we divide each data source into subgroups with varying quality. The subgroups of survey data with the best quality are used to check the corresponding register data records, and vice versa. The survey data are collected for the Norwegian Labour Force Survey (LFS), a relatively large sample survey. The register data source is composed of records from the employee register, tax return register (for self-employed) and The Norwegian Tax Administration’s “End of the Year Certificate Register”. The certificate register consists mainly of wage data, which is used to classify jobs that are not registered in the employee register. Overall, we find a high agreement rate at the micro level. Over 90 percent of individuals are classified to the same employment status in both sources. On the macro level, we try to differentiate between systematic- and random measurement errors. The register data seem to systematically overestimate employment, by about 1.5 percent, compared to only 0.1 percent estimated random error. In official Norwegian employment statistics the total number of employed in the registerbased statistics is adjusted to match the 4th quarter LFS employment estimate. Consequently, the measurement errors estimated for subgroups will not show up as divergence between the two official statistics. Proxy, or indirect, interviews by family members are allowed in the LFS survey. We find that proxy interviews underestimate employment rate by about 5 percentage point, when controlling for register-employment status. Proxy interviews constitute about 15 percent of the total sample, so the overall effect is noticeable but not alarming for the total employment rate. However, among young people there is both a high incidence of proxy interviews and substantial underestimation of employment due to the proxy interviews. Especially among students, this seems to be pronounced, resulting in nearly 15 percentage point underestimation of employment rate among students. Overall, we find about 1.4 percentage point overestimation of employment rate, and an agreement rate of over 92 percent. However, there is considerable variation in quality between the source registers. Only 68 percent of those classified as employed from wage-certificate data, are classified as employed in the survey data at the same time. Small jobs are overrepresented among those classified as employed based on the wage-certificate data. Consequently, misclassifications in this group will especially affect analyses of young people and other marginal employed. For the tax return register data, the corresponding figure is 87 percent. That could mean up to 13 percent misclassifications, which of course will affect register-based analyses of self-employment. We present a quality assessment of employment data sources that are used for official statistics. Specifically, we want to evaluate the employment classification at the micro level, and link survey- and register-based employment data at the individual level. We do not assume that one of the data sources has all the true answers. Instead, we divide each data source into subgroups with varying quality. The subgroups of survey data with the best quality are used to check the corresponding register data records, and vice versa. The survey data are collected for the Norwegian Labour Force Survey (LFS), a relatively large sample survey. The register data source is composed of records from the employee register, tax return register (for self-employed) and The Norwegian Tax Administration’s “End of the Year Certificate Register”. The certificate register consists mainly of wage data, which is used to classify jobs that are not registered in the employee register. Overall, we find a high agreement rate at the micro level. Over 90 percent of individuals are classified to the same employment status in both sources. On the macro level, we try to differentiate between systematic- and random measurement errors. The register data seem to systematically overestimate employment, by about 1.5 percent, compared to only 0.1 percent estimated random error. In official Norwegian employment statistics the total number of employed in the registerbased statistics is adjusted to match the 4th quarter LFS employment estimate. Consequently, the measurement errors estimated for subgroups will not show up as divergence between the two official statistics. Proxy, or indirect, interviews by family members are allowed in the LFS survey. We find that proxy interviews underestimate employment rate by about 5 percentage point, when controlling for register-employment status. Proxy interviews constitute about 15 percent of the total sample, so the overall effect is noticeable but not alarming for the total employment rate. However, among young people there is both a high incidence of proxy interviews and substantial underestimation of employment due to the proxy interviews. Especially among students, this seems to be pronounced, resulting in nearly 15 percentage point underestimation of employment rate among students. Overall, we find about 1.4 percentage point overestimation of employment rate, and an agreement rate of over 92 percent. However, there is considerable variation in quality between the source registers. Only 68 percent of those classified as employed from wage-certificate data, are classified as employed in the survey data at the same time. Small jobs are overrepresented among those classified as employed based on the wage-certificate data. Consequently, misclassifications in this group will especially affect analyses of young people and other marginal employed. For the tax return register data, the corresponding figure is 87 percent. That could mean up to 13 percent misclassifications, which of course will affect register-based analyses of self-employment.

About the publication

Title

Evaluating employment classification. A quality study linking survey data and register data

Author

Ole Villund

Series and number

Documents 2010/09

Publisher

Statistics Norway

Topic

Methods and documentation

ISBN (online)

978-82-537-7834-1

ISBN (printed)

978-82-537-7833-4

Number of pages

27

Language

English

About Documents

Documentation, descriptions of methods, models and standards are published in the series Documents.

Contact