What is IPUMS?
IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services are available free of charge.
Over the past 25 years, IPUMS has received 70 federal grants and contracts totaling over $140 million to curate, integrate, and disseminate government-produced data collections. Major funding for these projects has come from the National Institutes of Health, the National Science Foundation, and the Food and Drug Administration. IPUMS includes data produced by a broad range of agencies, including the Census Bureau, the Bureau of Labor Statistics, the National Science Foundation, the National Center for Health Statistics, the Centers for Disease Control, and the National Aeronautics and Space Administration.
In collaboration with 105 national statistical agencies, nine national archives, and three genealogical organizations, IPUMS has created the world’s largest accessible database of census microdata. IPUMS includes almost a billion records from U.S. censuses from 1790 to the present and over a billion records from the international censuses of over 100 countries. We have also harmonized survey data with over 30,000 integrated variables and 150 million records, including the Current Population Survey, the American Community Survey, the National Health Interview Survey, the Demographic and Health Surveys, and an expanding collection of labor force, health, and education surveys. In total, IPUMS currently disseminates integrated microdata describing 1.4 billion individuals drawn from over 750 censuses and surveys.
In addition to census and survey microdata, IPUMS integrates and disseminates the nation’s most comprehensive database of area-level census data and electronic boundaries describing census geography from 1790 to the present. IPUMS NHGIS includes 366 billion data points and 28 million map polygons describing U.S. Census geographic units. IPUMS Terra archives and disseminates a third class of data: raster data derived from satellite imagery, climate models, and other sources.
Our signature activity is harmonizing variable codes and documentation to be fully consistent across datasets. This work rests on an extensive technical infrastructure developed over more than two decades, including the first structured metadata system for integrating disparate datasets. By using a data warehousing approach, we extract, transform, and load data from diverse sources into a single view schema so data from different sources become compatible. The large-scale data integration from IPUMS makes thousands of population datasets interoperable. We have created software for consistency checking, automated data cleaning and editing, sampling, disclosure control, database harmonization, metadata creation, and parsing. Our data projects exploit machine-learning technology for automated string classification and record linkage and employ parallel processing to manipulate large datasets in our high-performance computing environment.
IPUMS is a part of the Institute for Social Research and Data Innovation at the University of Minnesota and is directed by Regents Professor Steven Ruggles.