The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Interesting Public Records Act Case
From Silverman v. Ariz. Health Care Cost Containment Sys., decided Thursday by the Arizona Court of Appeals (in an opinion by Chief Judge Kent E. Cattani, joined by Judge Cynthia J. Bailey and Vice Chief Judge David B. Gass):
This public records case presents a narrow issue of potentially broad import. Arizona law does not require a public entity to create any new record in response to a public records request. But does using encryption to redact non-disclosable information stored in an electronic database necessarily constitute creation of a new record? We hold that it does not.
This concept is particularly important in a case like this one, in which the public entity uses non-disclosable data as a critical part of its database structure (as the relational keys linking different tables). Thus, requiring the agency to use a one-way cryptographic hash function to redact the non-disclosable data—substituting a unique hashed value that masks protected information without destroying its function in the database—is necessary to ensure a requestor receives, to the extent possible, a copy of the real record.
And because such encryption only hides a limited aspect of the record—without adding to, aggregating, analyzing, or changing any of the underlying information—it does not create anything new and does not result in the creation of a new record. Accordingly, and for reasons that follow, we reverse the superior court's dismissal of the journalists' public records lawsuit at issue here and remand for further proceedings consistent with this opinion….
The Arizona Health Care Cost Containment System ("AHCCCS") oversees the Arizona Long-Term Care System ("ALTCS"). Appellants Amy Silverman, Alex Devoid, and TNI Partners (d/b/a Arizona Daily Star) are journalists researching issues related to services for Arizonans with developmental disabilities, including those services provided by ALTCS. Appellants are seeking public records from AHCCCS to learn what factors affect eligibility decisions during the ALTCS application and screening process.
In February 2020, Appellants submitted a public records request for data in AHCCCS's databases for multiple categories of information provided in or related to ALTCS applications. Appellants acknowledged that healthcare-related information would have to be de-identified to comply with privacy rules under the Health Insurance Portability and Accountability Act ("HIPAA"). Noting that the requested data might be contained in multiple tables, Appellants requested that, for de-identified data, AHCCCS "include a unique identifier, such as a hash key, to replace" information necessary to distinguish different individuals' records. Appellants' request expressly did not ask AHCCCS to "join tables together … or to conduct any type of analysis on the data," provided any existing relational keys remained intact….
Appellants eventually sued under the Arizona public records act, and here's how the court of appeals analyzed this:
Under Arizona law, "[p]ublic records and other matters in the custody of any officer shall be open to inspection by any person at all times during office hours." This statutory mandate reflects Arizona's strong presumption in favor of open government and disclosure of public documents. Public policy favors subjecting agency action "to the light of public scrutiny" and ensuring that citizens are "informed about what their government is up to."
A requestor is generally entitled to review a copy of the "real record," even one maintained in an electronic format, subject to redactions necessary to protect against risks to privacy, confidentiality, or the best interests of the state. Thus, upon request, a public entity must search its electronic databases to identify and produce responsive records. But the entity need not tally, compile, analyze, or otherwise provide information about the information contained in existing public records, which would in effect create a new record in response to the request. Nor is the entity required to compile the data in a form more useful to a requestor….
Using a one-way cryptographic hash function to substitute a unique hashed value for protected information does not add to or change any of the underlying information (much less aggregate or analyze the data); it just hides a limited aspect of it. Redaction-by-encryption does not create anything new, but rather represents a better-tailored redaction process that eliminates only information that is in fact protected….
We acknowledge that redaction-by-encryption is different than traditional redaction-by-deletion (or redaction-by-obscuring-text-behind-a-black-box), and it may only be feasible in the context of electronically stored records. But when public records are stored in that format, differences occasioned by newer forms of data storage may call for differences in how the data is disclosed. For example, embedded metadata is an inherent part of a public record maintained in an electronic format, even though such metadata was nonexistent and effectively meaningless for the same record stored on paper. Accordingly, applying redaction-by-encryption as a more tailored form of redaction (even if made possible only by electronic storage) serves to ensure that the requestor receives access to the "real record" to the greatest extent possible.
The most analogous authority construing the federal Freedom of Information Act ("FOIA") bears this out. [Details omitted. -EV]
We note that redaction-by-encryption does not entitle Appellants to anything more than the public record as it actually exists….
Accordingly, to the extent the tables and fields in the existing databases (pre-redaction) are not in fact linked—and the record is not clear on that issue—AHCCCS is not required to create new links to serve Appellants' purposes. But to the extent the links exist pre-redaction, all Appellants' complaint seeks, and what they are potentially entitled to, is preservation of those links that form part of the "real record." …
To be sure, the journalists' request may ultimately prove unduly burdensome given the scale of data involved, and redaction (by encryption and otherwise) may ultimately prove insufficient to adequately anonymize the data given the type of data requested. But those questions require evidentiary development and must be considered on their facts, not as questions of law….
Plaintiffs are represented by Arizona State's First Amendment Clinic, and in particular by attorneys Jake Karr (who orally argued the case, and who's now at the NYU Technology Law & Policy Clinic), Gregg P. Leslie, and Zachary R. Cormier, and law students Jack Prew-Estes, Jake Nelson, Maria McCabe, and Vanessa Stockwill.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
It's entirely possible that someone receiving the data could infer what the encrypted data were, just by looking at the frequency of different values. But reasonably it's not a new record, any more than a printed copy would be a different record from the electronic one.
No i dont think really possible.
Depends what is encrypted. If there are relatively few possible values, then one may infer from the expected frequency which encrypted value is which.
It sounds like they want to encrypt the "AHCCCS ID". You're not realistically going to use a frequency attack on a hash of a unique ID to decrypt it. Hashes don't really work that way.
The greater concern is re-linking records based on other known information. If you know someone has a particular record, you don't *need* to decrypt the ID to find all their other records. You just find that one record, note the hash, and look for any other record with the same hash.
As the judge says, whether there's a security concern is a factual matter and not one to be determined as a matter of law before any evidence is presented.
It didn't seem clear what they were encrypting. Encrypting an internal surrogate key would be sort of pointless; if this key is available elsewhere, one might be able to get the original key by correlating other information.
I remember from long ago a new report where a report on military aircraft somewhere redacting the number not ready for operation, but leaving unredacted the total number and the percentage not ready. The people who do redaction have not necessarily thought it all through.
In the context above, the what being encrypted is almost certainly patient SSNs. Since there are 10^9-1 theoretical SSNs and on the order of 500 million actual ones, you're not going to be able to deduce a pattern from that. Even if you knock it back the subset of patients in Arizona, it's still a whopping big number - way too many to crack the code.
You do realize, don't you, that until 2011, SS numbers were NOT randomly assigned? And they're still only partially random, with the first 3 digits identifying the location where they were issued.
So, no, there is absolutely a pattern there.
What code would need to be cracked? The court's opinion is not consistent in calling it encryption or a hash, but an unkeyed hash (which is normally the default, unless someone says "keyed hash") would be inappropriate for this kind of use. An unkeyed hash means the output is entirely determined by the input, and the billion possible SSNs can be exhaustively computed. Then a hashed SSN can simply be looked up in a table to find the true one, revealing personally identifiable information about the person associated with that record.
Assigning arbitrary "cover values" per anonymized column would protect against that kind of reverse-lookup attack, as would many kinds of encryption. The challenge is to make it infeasible to un-obfuscate the data given whatever other data is directly or indirectly associated with the cover values. If several data tables contain "foreign keys" (values that each represent a unique row in some other table) then those links tend to reveal more information about the person or people involved.
You'd think that any agency capable of implementing a cryptographic hash on the data would also be capable of correctly salting it, but that may giving too much credit to the government.
But otherwise, yeah, implementing a brute-force attack against mere billions would take no time at all - I just wrote a program that rant through all SSNs as strings, hashed using SHA-256, and it took about two minutes to process and write the list to disk.
Without doing something, an unsalted or poor hash would be no different than turning over the original data.
It depends very much on what fields are obfuscated and what fields remain readable. The general computer-science field that studies this question is called differential privacy, and the answers are usually non-trivial. As the court says, the answer would be very fact-specific.
Privacy laws often forbid release of personally identifiable information (PII) in regards to health, banking, or other data. An arbitrarily assigned, individually unique ID is PII if it can be used to look up that person's data -- but PII can also be a name and address, or a common name and an uncommon disease. If there are three people named "David Sosa" in a data set, and no other name has exactly three people in the data set, then an obfuscated version of the name plus other information will probably be PII in the context of that data set: Knowing which name corresponds to exactly three members of the data set will reveal what real name goes with that obfuscated name.
An “Interesting Case" by way of most nodding "Yup." And a rare case, as such, at that!
Agreed. Redacting information doesn't create a "new record". The method doesn't matter. If it did, vast quantities of records would become unattainable. Every records law I'm aware of requires redaction of confidential information.
OFF TOPIC Question for Eugene (who used to be good at math, and may still be):
People say that the sky is blue because blue light scatters more readily than lower-frequency light, because the scattering of light of given frequency is proportional to the frequency raised to the fourth power. OK. Then why isn't the sky purple? Purple (violet) light has higher frequency than blue.
Can you explain this?
Far out man but you need to stop licking toads.
Not enough purple in sunlight for that to matter. The Sun's peak output is in green, and as wavelengths get shorter than that, they drop off increasingly fast. By the time you get up to purple, sunlight is pretty dim.
(Probably not!) Coincidentally, the peak sensitivity of your eyes is also in green, and by the time you reach purple, your eyes can barely detect the photons.
So, you don't notice the purple component, though it's over-represented relative to regular sunlight.
Thanks!