Changes to Census Bureau Data Products

The Census Bureau has announced a new set of standards and methods for disclosure control in public use data products. According to the Census Bureau, the new approach, “marks a sea change for the way that official statistics are produced and published” and represents "the death knell for public-use detailed tabulations and microdata sets as they have been traditionally prepared.” The reason for these changes is concern about respondent confidentiality, even though the decennial census and American Community Survey (ACS) research data files have an unblemished record of confidentiality. As the Census Bureau acknowledges, there has never been a single documented case where the identity of a respondent in the ACS or decennial census has been revealed by someone outside the Census Bureau.

IPUMS is concerned that scientists, planners, and the public will soon lose the free access we have enjoyed for the past six decades to reliable public Census Bureau data describing American social and economic change. This page reports what we have learned about the new data products.

Use this form to join our mailing list for updates on the Bureau’s evolving plans and to tell us about how the proposed changes might affect your research.

2020 Census
ACS Microdata
ACS Small-Area Data
Resources

DIFFERENTIAL PRIVACY IN THE 2020 CENSUS

The Census Bureau has already begun using a new disclosure avoidance system for the summary files of the 2020 Census. These data files cover a limited range of subjects, since the census asks only a few questions, but they are still one of the nation’s most used public data resources, essential for redistricting, allocation of funds, urban and regional planning, and studies of residential segregation. Given the complete coverage of the decennial census, these data provide a crucial high-quality baseline for surveys and estimates throughout each decade. They are also the only source of high-quality nationwide data for small areas, for which survey sample sizes (from the American Community Survey or other sources) are typically too small to produce reliable estimates.

The Census Bureau plans to release only "differentially private" data from the 2020 Census. These data will have intentional errors added to nearly all statistics, including even the total populations of all geographic units below the state level.

The Census Bureau justifies the new disclosure controls by citing the threat of database reconstruction, which is a technique for inferring individual-level responses from tabular data. Our analysis, however, determined that the threat of database reconstruction was minimal. The Census Bureau's attempt to reconstruct the 2010 Census from published tabulations was incorrect in most cases, and did not perform much better than random guesses of people's characteristics. As Acting Director of the Census Bureau Ron Jarmin concluded, “The accuracy of the data our researchers obtained from this study is limited, and confirmation of reidentified responses requires access to confidential internal Census Bureau information … an external attacker has no means of confirming them."

To allow others to assess the impact of differential privacy on data usability, the Census Bureau has produced a series of demonstration products, each providing a different version of differentially private 2010 census data that users can compare with the originally published 2010 data. The most recent demonstration data, released in June 2021, are based on the production system for 2020 Redistricting Data, so the added errors in this demo product are representative of those in published 2020 data tables.

IPUMS, along with collaborators at the University of Washington, the University of Tennessee, and NORC at the University of Chicago, received a grant from the Alfred P. Sloan Foundation to analyze the demonstration files, and other groups from Harvard and CUNY also undertook analyses. These studies investigated only earlier versions of the demo data, and not all of the studies have been publicly released, but results thus far suggest that the new disclosure avoidance system will have adverse impacts for redistricting and for many research applications.

SYNTHETIC MICRODATA FROM THE AMERICAN COMMUNITY SURVEY

The American Community Survey (ACS) microdata is by far the most intensively-used dataset disseminated by IPUMS and is a core dataset across social science and health research. Common topics of analysis include poverty, inequality, immigration, internal migration, ethnicity, disability, transportation, fertility, marriage, occupations, education, and family structure.

At the April 2021 ACS Data Users conference, the Census Bureau announced that it will replace the ACS research data with “fully synthetic” data over the next three years. A week after the conference--after an uproar on Twitter--the Census Bureau backtracked, and now says that there is no firm timeline on implementation of simulated ACS data. The Census Bureau has not announced any formal process for evaluation of the change, as is required under the Administrative Procedures Act.

The Bureau has not finalized the details of their methods, but the idea is to develop statistical models describing the interrelationships of the variables in the ACS and then construct a simulated population consistent with those models. Such modeled data captures relationships between variables only if they have been intentionally baked into the model. Accordingly, synthetic data are poorly suited to studying unanticipated relationships, which impedes new discovery. Most analyses currently conducted with the ACS are likely to become impossible with the shift to synthetic data. For example, the ACS makes it easy for investigators to measure ethnic intermarriage, or the impact of a partner’s education on women’s fertility. The synthetic data would likely incorporate only individual-level interrelationships among variables, so analysis across household members would be impossible.

The Bureau apparently recognizes that the synthetic ACS microdata will not be suitable for research. The Bureau therefore proposes a system whereby investigators would develop analyses using synthetic data, and then submit them to the Census Bureau for “validation” using real data. This would preclude exploratory analyses on the real data, and would probably be logistically infeasible.

SMALL-AREA DATA FROM THE AMERICAN COMMUNITY SURVEY

The Census Bureau has announced that the ACS summary data will also be made "formally private" by 2025 at the earliest, but it has provided no further details about either the methods or the timeline for achieving this goal.

RESOURCES

Updates & Research Reports

Demonstration Data

  • Census Bureau DAS Demonstration Data & Metrics
  • IPUMS NHGIS Privacy-Protected Demonstration Data
    • In these data files, NHGIS has linked together two versions of 2010 Census summary tables: (1) original tables from the 2010 Census Summary Files, and (2) new tables based on different vintages of the Census Bureau's differentially private demonstration data.
    • The 2021-06-08 vintage includes demo data based on the final production system for 2020 Redistricting Data, so this vintage may be used to model the error distribution in published 2020 data tables.

We will continue to gather relevant information for the IPUMS user community and post here and share via IPUMS Twitter.