Further Guidance from the Census Bureau

From John M. Abowd, Chief Scientist and Associate Director for Research and Methodology

Thank you all for helping us understand the many, many use cases for decennial census data. I hope the clarification below will be useful.

Please do not be intimidated by the spreadsheet in the Federal Register Notice. Give us the feedback as if you were preparing a peer-review of our publication products for 2010 vis-a-vis your scientific uses, if that is more helpful.

Two specific guidelines that will make your input more relevant.

First, we do want to know the geographic and other levels of detail that were the inputs to your analyses, but we also need to know the criteria that you used to assess the outputs of those analyses, and, in particular, the level of geography and other detail that your error measures were calculated over.

If you saw the launch of the Opportunity Atlas, our joint project with Harvard and Brown, on October 1st, you can see there that the use case required tract-level accuracy because one objective was to permit better targeting of remedial programs. That project used a brand-new noise injection confidentiality protection system that was based on differential privacy, but did not fully meet the DP guarantees. It was developed jointly by the Census Bureau and the Harvard team, and peer-reviewed by the Harvard Privacy Tools Project as part of the Cooperative Agreement between Georgetown University (which includes personnel from the Harvard project) and the Census Bureau. That system worked better than the suppression-based rules that the team had been assuming would apply before the Data Stewardship Executive Policy Committee instructed the DRB not to use such rules any longer at the sub-state level. Had the Atlas used the suppression rules, all of the aggregate analyses based on the public-use data would have been seriously biased. They are both unbiased and inference-valid (the standard errors are correct) when using the new system. You may privately communicate with Raj Chetty and/or Salil Vadhan, if you want further details. I did not copy them here to allow you to provide the context of your inquiry. (https://www.census.gov/newsroom/blogs/research-matters/2018/09/the_opportunity_atla.html and https://privacytools.seas.harvard.edu/people/salil-vadhan).

Second, in responding to some of the replies that we have already received, I have made the following statement. If you routinely use tabulations from the American Community Survey, then you are implicitly accepting the following fitness for use criterion, which should not be interpreted as the only valid fitness for use measure, but is certainly an important one:

"Another kind of data release rule, data quality filtering, applies to ACS 1-year and 3-year estimates. Every detailed table consists of a series of estimates. Each estimate is subject to sampling variability that can be summarized by its standard error. If more than half of the estimates in the table are not statistically different from 0 (at a 90 percent confidence level), then the table fails to meet the rule’s requirements and is restricted from publication. Dividing the standard error by the estimate yields the coefficient of variation (CV) for each estimate. (If the estimate is 0, a CV of 100 percent is assigned.) To implement this requirement for each table at a given geographic area, CVs are calculated for each table’s estimates, and the median CV value is determined. If the median CV value for the table is less than or equal to 61 percent, the table passes for that geographic area and is published; if it is greater than 61 percent, the table fails and is not published."

http://www2.census.gov/programs-surveys/acs/methodology/design_and_methodology/acs_design_methodology_report_2014.pdf (page 190)

Assuming that the only source of error is the noise injection from the disclosure avoidance system, please explain why a fitness-for-use rule similar to the one stated above would be appropriate or inappropriate for your use case. Note, we have no plans at the moment to implement suppression based on such a rule, just to assess fitness-for-use after applying disclosure avoidance noise injection with criteria that could be made very similar to the ACS rules.