The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Avoid Super-Embarrassing Redaction Failures
I just saw one in a recent filing from an AmLaw top 20 law firm (by gross revenue rankings).
[I posted a version of this post in 2020 and 2024, but I've seen the problem enough since to think it was worth mentioning again.]
I have often run across documents written by lawyers that looked redacted—but all the supposedly secret information in them could be extracted with literally three keystrokes (ctrl-A, ctrl-C, ctrl-V). One was a court filing that was filed pursuant to a court order authorizing the redaction; but the material so carefully marked secret proved not to be secret at all.
Another carefully tried to hide the real name of a litigant whom the lawyer was trying to keep pseudonymous; but the name was one copy-and-paste away from being visible. What's more, when the documents were posted online in searchable spaces, search engines indexed the supposedly hidden material, so searching for the real name would find the document in which the lawyer had been trying to redact the name.
For at least one of the documents, I know what improper redaction mechanism was used: The lawyer used Google Docs to highlight passages using black highlighter, and then saved the document as a PDF. That looked blacked out on the screen; but the underlying text still remained in the PDF document—as far as the software was concerned, the text wasn't removed but was just set in a different color. (Something similar would happen with Microsoft Word.)
By clicking ctrl-A in PDF, I selected the whole document. (You can also just select the passage that contains the redactions.) By clicking ctrl-C, I copied the selected text to the clipboard. And then by clicking ctrl-V in another app, I pasted it with all the formatting, including the highlighting, removed. (In some situations, it takes a ctrl-shift-V.) The text was then completely visible. Commenter anorlunda on an earlier post explained the problem well:
Users are trained WYSIWYG. What you see is what you get. That's brilliant marketing, but when you make black text on a black background, what you see is nothing, but what you get is something else. So redaction contradicts our training.
To the best of my knowledge, Adobe Acrobat Pro redaction actually deletes the underlying text, if you mark the text for redaction and then apply the redactions. I'm sure there is other software available to do this, including free software. Just make sure that whatever you do, the redaction is actually complete.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please to post comments
Correct on both counts, though I have yet to find anything remotely as straightforward to use as Acrobat Pro (and I say this as someone who probably has above-average patience for user experience tradeoffs in free/open-source software).
Amen. Many big shops have redaction policies requiring multiple sets of eyes for the redacted content itself, but I wonder how many explicitly require specific tools and double checking that those tools were the ones used.
"Just make sure that whatever you do, the redaction is actually complete."
I (retired federal special agent) recall some training we received in the mid 90s about handling/processing computer generated documentation and (IIRC) there were 14 places within Microsoft where 'deleted' or 'redacted' information could be retrieved.
There's always a server somewhere!
I think a key point is that someone should test the redaction, e.g. by searching (Ctrl+F) for a string they know should be removed. (And also test the unredacted file for that string so you know your test is meaningful). If you're not in the habit of verifying your tool worked that way you expected, then…well, conversations about technical competence come to mind.
The last time I managed a document review project, my recollection was that documents were redacted in Acrobat Pro, then the entire database was converted to .tif files, then the image files were OCR'd. Then, finally, the OCR'd files were globally searched for various sensitive terms/strings, which checked both to look for underinclusive redactions and made sure the redacted texts were gone. The really time consuming parts were run overnight or on weekends.
Or just print it out and, if necessary, hand black out the material to be redacted, then scan the result as a new pdf.
I know this violates the preference/requirement that there be searchable text but the result could still be fed through OCR if someone needed to search text in the document.
YES! Please do that.
Of course criminals are sloppy and don't black out the material properly.
In some cases you can just copy the redacted material on a regular copier and use the lighten function and it allows you to see through the redaction.
There was a high-profile Sharpie redaction fail in a big litigation just a couple of years ago. (Click on the "findings" link in the article to see a sample of the document, which from the extremely light overall contrast I presume has been enhanced a good deal and didn't really look that way to the human eyes that let it out the door.)
If you have to do it, every scanner I've ever used can be set to resolve only black and white rather than the grayscale that allows this sort of leak. But that would make the rest of the document look fairly grainy (like a fax, albeit with higher resolution).
Back when I had to do this, and didnt have software I trusted, I did a 4-step process
1) Print document
2) Black sharpie over text to be redacted (on back side too, if I was feeling especially anal)
3) Copy on dark setting
4) Scan copy into pdf
> Or just print it out and, if necessary, hand black out the material to be redacted
Careful. Your blackout marker may look to the human eye like it has completely obscured the text, but are you sure that the original text isn't just a tiny bit darker or a tiny bit lighter? It wouldn't be shocking to find that a scanner would pick up that tiny difference, and then somebody with a color inspection tool could recover the text.
I just did that experiment: I printed some text on my laser printer, and then blacked it out with my Sharpie. To a casual glance, it looked blacked out, but when I scanned it I didn't even need a color inspection tool; I could easily see the text. When I looked again at the paper, at the right angle I could easily see the text because the laser print is a lot more reflective than the Sharpie.
Results will vary, of course, depending on the printer, the marker, the paper, et cetera... but don't be so sure about the security of blacking it out by hand.
That's why you scan it to a black and white output, so that there aren't any subtle differences to inspect, it's all either black or white. Yes, I've had to redacts stuff occasionally, it's always a pain to do it properly.
Don't forget to open the file in a file editor and run a search on it, just in case there's something buried in the metadata.
Sorry I missed the split thread down here before posting above. This indeed has spectacularly failed in the real world.
Yeah, sorry, I don't comment here often enough to remember that there's a reply feature.
"Users are trained WYSIWYG. What you see is what you get." Guilty as charged.
If we think about training and user's manual language, it becomes clear that the only proper way to do it is to have a feature called "REDACT", then to implement it correctly, and then train new users on its use.
A properly implemented REDACT would also prevent the use of UNDO, and propagate the change to all backup copies of the file. That is because another possible embarrassing way to err is to distribute the unredacted version. The only secure way to prevent that is to make sure that there are no unredacted versions remaining.
If your intent is that some people can see the redacted information but other people cannot, then access control must be person-centric not copy-centric. The concept of a redacted copy is from pre-computer days, it should not be used today.
In a legal context, I cannot think of any situation in which it would be appropriate to eliminate all unredacted versions. But there should always be a flag in a document management system saying "Confidential" — for, e.g., privileged documents — and the unredacted version should get that tag.
You can also print it out, and then use an exacto knife to cut out the redacted terms, and then scan it back in. But that seems like a lot more work than just using Acrobat correctly.
Just remember not to use the document feeder to scan those X-ACTO Redacted™ versions or you'll likely spend far more time extracting scraps from document feeder and repeating the redaction on a new copy than you expected to spend 🙂
Never use highlighting to redact. Instead cover it with a black square. Acrobat has a tool "Remove Hidden Information." By using this tool, your remove form fields as well as text behind a black square.
I'm a paralegal and this is what I do to every legal document.
Black highlighter? That's a first.
I was expecting it to be something a little more excusable like pasting a black box and expecting the PDF to flatten it.
Lawyers really aren't very smart.
I use Foxit for PDF editing; it has a redaction feature. Yes, it's vitally important to use the feature designed for this very purpose, and not play around with backgrounds or other formatting.
In a pinch they could have exported their redacted pdf as an image. Then import the image into a new PDF.
That assumes that the pixel values of black on black redacted into are the same as black on white in the redacted area. A color difference of just 1 bit may be far too little difference for human eyes, but enough difference for a computer to discover the redacted characters.
That is the point that Life of Brian and Jordan Brown made above. Redacted to human eyes is not the same as redacted to electronic eyes.
When I worked in-house at a small company and had to respond to subpoenas for, eg, medical information, I copied the files, blacked it out with a Sharpie on both sides (front and back), copied that again, blacked it out again on the copy, and scanned it in. Then I played with the scan - lighten, sharpen, all that, and it was always fine.
Another one: in .pdf, put a black box over the information to be redacted. Print it out and scan the print.
PDF files can be edited with a text editor, allowing someone with only a little technical knowledge to perform a true redaction by editing the text itself rather than messing with markup. I have done this myself.
A SAAS offering to resolve this problem with automation - Blackmarker (https://blackmarker.com).