The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Avoid Super-Embarrassing Redaction Failures
I just saw one in a recent filing from an AmLaw top 20 law firm (by gross revenue rankings).
[I posted a version of this post in 2020 and 2024, but I've seen the problem enough since to think it was worth mentioning again.]
I have often run across documents written by lawyers that looked redacted—but all the supposedly secret information in them could be extracted with literally three keystrokes (ctrl-A, ctrl-C, ctrl-V). One was a court filing that was filed pursuant to a court order authorizing the redaction; but the material so carefully marked secret proved not to be secret at all.
Another carefully tried to hide the real name of a litigant whom the lawyer was trying to keep pseudonymous; but the name was one copy-and-paste away from being visible. What's more, when the documents were posted online in searchable spaces, search engines indexed the supposedly hidden material, so searching for the real name would find the document in which the lawyer had been trying to redact the name.
For at least one of the documents, I know what improper redaction mechanism was used: The lawyer used Google Docs to highlight passages using black highlighter, and then saved the document as a PDF. That looked blacked out on the screen; but the underlying text still remained in the PDF document—as far as the software was concerned, the text wasn't removed but was just set in a different color. (Something similar would happen with Microsoft Word.)
By clicking ctrl-A in PDF, I selected the whole document. (You can also just select the passage that contains the redactions.) By clicking ctrl-C, I copied the selected text to the clipboard. And then by clicking ctrl-V in another app, I pasted it with all the formatting, including the highlighting, removed. (In some situations, it takes a ctrl-shift-V.) The text was then completely visible. Commenter anorlunda on an earlier post explained the problem well:
Users are trained WYSIWYG. What you see is what you get. That's brilliant marketing, but when you make black text on a black background, what you see is nothing, but what you get is something else. So redaction contradicts our training.
To the best of my knowledge, Adobe Acrobat Pro redaction actually deletes the underlying text, if you mark the text for redaction and then apply the redactions. I'm sure there is other software available to do this, including free software. Just make sure that whatever you do, the redaction is actually complete.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Correct on both counts, though I have yet to find anything remotely as straightforward to use as Acrobat Pro (and I say this as someone who probably has above-average patience for user experience tradeoffs in free/open-source software).
Amen. Many big shops have redaction policies requiring multiple sets of eyes for the redacted content itself, but I wonder how many explicitly require specific tools and double checking that those tools were the ones used.
I think a key point is that someone should test the redaction, e.g. by searching (Ctrl+F) for a string they know should be removed. (And also test the unredacted file for that string so you know your test is meaningful). If you're not in the habit of verifying your tool worked that way you expected, then…well, conversations about technical competence come to mind.
The last time I managed a document review project, my recollection was that documents were redacted in Acrobat Pro, then the entire database was converted to .tif files, then the image files were OCR'd. Then, finally, the OCR'd files were globally searched for various sensitive terms/strings, which checked both to look for underinclusive redactions and made sure the redacted texts were gone. The really time consuming parts were run overnight or on weekends.
Or just print it out and, if necessary, hand black out the material to be redacted, then scan the result as a new pdf.
I know this violates the preference/requirement that there be searchable text but the result could still be fed through OCR if someone needed to search text in the document.
> Or just print it out and, if necessary, hand black out the material to be redacted
Careful. Your blackout marker may look to the human eye like it has completely obscured the text, but are you sure that the original text isn't just a tiny bit darker or a tiny bit lighter? It wouldn't be shocking to find that a scanner would pick up that tiny difference, and then somebody with a color inspection tool could recover the text.
I just did that experiment: I printed some text on my laser printer, and then blacked it out with my Sharpie. To a casual glance, it looked blacked out, but when I scanned it I didn't even need a color inspection tool; I could easily see the text. When I looked again at the paper, at the right angle I could easily see the text because the laser print is a lot more reflective than the Sharpie.
Results will vary, of course, depending on the printer, the marker, the paper, et cetera... but don't be so sure about the security of blacking it out by hand.