Leveraging Linked Census Data: Resources and Opportunities from Full Count IPUMS Data

PAA 2023 Pre-Conference Workshop

Register for the workshop via PAA

Workshop Summary

IPUMS disseminates full count census enumerations for nine census years from 1850 to 1940. These full count data cover almost 700 million individual records and have opened the possibility of automated record linkages across census years to construct millions of individual life histories and trace millions of families over multiple generations. The IPUMS Multigenerational Longitudinal Panel (MLP) project links individuals' records across censuses. IPUMS MLP currently delivers crosswalks that link individual records in full count historical census data between censuses from 1850 to 1940. Those linking keys will soon be available through the IPUMS USA data access system, allowing users to more efficiently select and subset samples of full count data. We are also developing links to administrative data, including the Social Security NUMIDENT, that will be used to enhance the census linkages in subsequent versions of the data. 

Workshop Objectives

  1. Introduce full count IPUMS data and the challenges and opportunities of census linkage.
  2. Explain the MLP linking strategy and machine learning algorithm.
  3. Demonstrate how to access the data through IPUMS data access system, with tips on managing file size and using linking keys and crosswalks to create longitudinal panels.
  4. Share information on opportunities to link IPUMS MLP to additional data, including modern surveys on older Americans and the rich collection of data available in Federal Statistical Research Data Centers (FSRDCs).
  5. Provide opportunities for individual consultations with data experts. 

The power of harmonized, full count data

Full count census data represent a new class of source material for social scientists. These harmonized full-count data will allow innovative analyses of spatial change, support consistent analyses of the impact of neighborhood context on individual behavior, and will permit studies of the smallest subpopulations. By linking individuals and families across censuses, analysts can create national longitudinal panels that trace the characteristics of individuals over their lives and families over multiple generations.

Although these data have rich research potential, the massive expansion in the scale and scope of census microdata poses new challenges for researchers. Some researchers lack the computing power necessary to take advantage of these data, and even those with sufficient resources still require new programming strategies to manipulate and analyze data of this scale. Record linkage poses a host of problems, from analytic decisions to computational capacity. Restrictions around data access pose constraints for sharing data and can cause confusion on how to access different resources. The goal of this workshop is to lower barriers to using this important resource, particularly focusing on creating manageable linked datasets and how to request access to additional resources that can be linked to IPUMS MLP. 

Tentative Agenda

  • Welcome & Introductions (15+ minutes): Cathy Fitch
  • IPUMS USA full count census data (45 minutes): Cathy Fitch
  • The MLP linking strategy (1 hour): Jonas Helgertz
  • How to access full count and MLP data (1 hour): Matt Nelson
  • Lunch Break (1 hour)
  • IPUMS MLP and the Census Bureau’s Data Linkage Infrastructure (1 hour): Katie Genadek 
  • The 1940 Census Linked to Modern Surveys of Older Americans (1 hour): Rob Warren 
  • Hlink record linkage software (30 minutes): Jake Wellington (IPUMS Data Engineer)
  • Exercise and/or consultations (90 minutes): Matt Nelson and Cathy Fitch (exercise) and other experts (consultations)

Workshop Instructors

  • Catherine Fitch is the associate director of the Institute for Social Research and Data Innovation (ISRDI), and Minnesota Research Data Center (MnRDC) at the University of Minnesota. 
  • Steven Ruggles is a Regents Professor of History and Population Studies at the University of Minnesota, and the Director of the Institute for Social Research and Data Innovation. 
  • Katie Genadek is the Director of the Decennial Census Digitization and Linkage (DCDL) project at the U.S. Census Bureau. 
  • Jonas Helgertz is an Associate Professor in Economic History at the Department of Economic History, Lund University. Dr Helgertz is also affiliated with the Centre for Economic Demography at Lund University and the Institute for Social Research and Data Innovation, University of Minnesota. 
  • Matt Nelson is a Research Scientist with IPUMS at the Institute for Social Research and Data Innovation at the University of Minnesota. 
  • John Robert Warren is professor of sociology at the University of Minnesota.