The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Avoid Super-Embarrassing Redaction Failures
A Public Service Announcement, especially for the lawyers among our readers.
In the last several weeks, I ran across two documents written by lawyers that looked redacted—but all the supposedly secret information in them could be extracted with literally three keystrokes (ctrl-A, ctrl-C, ctrl-shift-V). One was a court filing that was filed pursuant to a court order authorizing the redaction; but the material so carefully marked secret proved not to be secret at all. Ugh.
For at least one of the documents, I know what improper redaction mechanism was used: The lawyer used Google Docs to highlight passages using black highlighter, and then saved the document as a PDF. That looked blacked out on the screen; but the underlying text still remained in the PDF document—as far as the software was concerned, the text wasn't removed but was just set in a different color.
By clicking ctrl-A in PDF, I selected the whole document. By clicking ctrl-C, I copied all the text to the clipboard. And then by clicking ctrl-shift-V in another app, I pasted it with all the formatting, including the highlighting, removed. (Ctrl-V in Word works, too, if I select the Keep Text Only paste option.) The text was then completely visible.
To the best of my knowledge, Adobe Acrobat Pro redaction actually deletes the underlying text, if you mark the text for redaction and then apply the redactions. I'm sure there is other software available to do this, including free software. Just make sure that whatever you do, the redaction is actually complete.
Of course, the most reliable redaction mechanism is still printing, blacking out the material completely, and then scanning it back into a new file. But this option won't work for court filings in the many courts that require full-text-searchable PDFs generated directly from the computer, rather than from a scanner.
UPDATE: Commenter anorlunda explains the problem well:
Users are trained WYSIWYG. What you see is what you get. That's brilliant marketing, but when you make black text on a black background, what you see is nothing, but what you get is something else. So redaction contradicts our training.
To get the Volokh Conspiracy Daily e-mail, please sign up here.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Lawyers are clueless about technology. News at 11.
How does Adobe Acrobat Pro show redactions? One of the problems with black highlighter is leaving the length intact, which sometimes makes it possible to make plausible guesses. Does Adobe replace the text with properly sized black rectangles to preserve line and page numbers? Does it replace with <REDACTED&gr; and extra newlines to retain line breaks>
Well, hopefully it uses the right right angle bracket 🙂
It looks just like the relevant text was written-over with black marker (except, as Eugene notes, the underlying text is deleted).
A colleague of mine once redacted a document, but did not finalize the redactions.
What are the ethical implications of attempting to view such redacted materials? Isn’t it akin to an inadvertent disclosure of confidential information?
I just skimmed the rules in one state about inadvertent disclosure of confidential information, and it seems to say that a member of the bar who inadvertently receives privileged information isn’t allowed to look at it.
I didn’t see any limitations on the scope of the requirement. I wonder if it applies to, say, a lawyer who receives the information outside of the practice of law when his is doing something akin to journalism. That would seem to raise 1A issues.
Assuming they are not holding back or misleading, Acrobat Pro’s redaction is pretty robust, and endorsed by the likes of the NSA and Australia’s signals directorate (links too long to paste–need to Google).
There are distinct risks associated with analog redaction (other than thoroughly cutting out the relevant text): http://www.cse.lehigh.edu/~lopresti/Publications/2005/spie05a.pdf
Magic marker over physical text just stops the casual reader and not someone determined.
Making a xerox copy improves it though.
If you’re using a Windows machine, you can also just paste your copied text into Notepad, which automatically strips all formatting codes.
Technical illiteracy among word processing users has been with us since the first electronic machines.
Users are trained WYSIWYG. What you see is what you get. That’s brilliant marketing, but when you make black text on a black background, what you see is nothing, but what you get is something else. So redaction contradicts our training.
In MS Word or Google Docs, you must buy third party add-ons to redact, or save your file as an “image only” PDF. The “image only” is critically important. Personally, I don’t trust PDF because I can not audit what they do by inspection of the internals of the file. Where secrecy is involved, trusting third parties is a problem.
I use Acrobat Pro, but there are other pdf editors that can redact (e.g., Nitro Pro). There are even some online tools for redacting, although I haven’t tested any of them.
You need to be careful with analog redactions, at least if you are blacking out over existing text, and then scanning. Digital contrast adjustments may recover more than you suspect possible. Chemical and physical attacks may combine with digital methods, and deliver astonishing results. Dead Sea scrolls, rolled up and actually burned to charcoal in an ancient library fire, have been effectively x-ray scanned without unrolling them, and information recovered.
One alternative (not saying I know of anyone who does this) might be, after blacking out text in the original, shoot everything onto orthographic film. Orthographic film (customarily used for negatives as an intermediate step in making offset printing plates) creates extreme contrast. Everything in the original is rendered either perfectly clear or perfectly opaque on the negative, so no in-between-density information is recorded. Everything greater than 50% density on the original becomes completely clear on the negative. All the less-than-50% density stuff turns uniformly black on the negative. Black type turns clear, white paper turns black. In-between densities become clear or black according to which end they are closer to.
That probably would solve the redaction problem reliably, but if it did not, you could readily see on a light table that the black-outs (clear blocks on the negative) had unwanted black details in them. If all your black-out blocks are perfectly clear, there is no information left in them. And then, once your negatives are ready and checked, you can conveniently make printing plates to create as many printed copies as you need.
While perhaps technically true, that is the most ridiculous proposal I have ever read on VC.
It is an efficient process, and efficiently secure. Only blacking over the redacted material separates the process I describe from the standard analog offset printing process. After blacking out, all the other steps proceed as before.
Measured by ease of production, digital methods are becoming more efficient, but as the OP has noted, they create occasional security hazards.
I think you are describing high contrast lithographic film, which was also commonly orthochromatic (insensitive to red light).
Save a scanned text page with redaction (blacked out text) as a monochrome bitmap removing any contrast within the blacked out material. Create a PDF using the monochrome images. Print the PDF.
You know that making something monochrome is not the same as removing all contrast, right?
“You know that making something monochrome is not the same as removing all contrast, right?”
Making something a monochrome bitmap is literally removing all contrast within the blacked out material, as the poster said.
TwelveInch, no. Contrast in a bitmap is ability to distinguish adjacent pixels. Varying hues among the pixels is one way to produce contrast, but is not the only way. Even with a monochrome hue enforced among all the pixels, distinctions based on differences in saturation or brightness remain possible, and those variations also deliver contrast. That is why black and white photography can be a thing.
All commonly used digital color models take contrast into account, and distinguish color and contrast by use of some system (there are several) of 3-axis evaluation, which assigns separate values to 3 defined attributes, whether Hue, Saturation, and Brightness, or CMY, RGB, or something known as L*a*b* color. Results from each of these systems can be translated with more or less fidelity into any of the others.
The L*a*b* color system has been widely adopted as the behind-the-scenes computer standard which keeps your WYSIWYG color experience seeming intuitive—which the L*a*b* system is not. You might think you are making an RGB color adjustment in Photoshop, but a L*a*b* process behind the scenes will deliver your RGB result.
However, Nieporent has made a fair point. What might be enough for routine legal purposes might be all that is needed, without further insight. A fully technical discussion could seem pedantic, absent any need for extreme security. And you would need an actual color scientist (not me) to conduct it.
But just to let you know, when it comes to bitmaps, and the ways computers (and screens, and printers) use them, entirely other issues relating to contrast in monochromatic bitmaps also arise behind the scenes. Those can be considered under other headings, including hinting and anti-aliasing.
Once again, commenter anorlunda was right on, when he pointed to WYSIWYG as a potential stumbling block. When WYSIWYG graphics were being invented, things turned out more complicated than folks initially expected. It took a long time to get it right. Experience taught that the equivalence the WYSIWYG notion touts is based partly on optical (meaning brain/eye) continuity across varying digital systems. Workers learned that in common instances that could best be achieved paradoxically, by subtly adjusting different systems’ outputs to fool the eye into recognizing as equivalence a carefully engineered mix of similarities and differences.
The point? Anyone relying on dynamically-managed digital bitmaps to stand in for uniformity will likely be coping unawares with a lot of behind-the-scenes non-uniformity, which is nevertheless highly systematic and full of information. And 3 axes are involved, not one axis. And other trip-me-ups besides.
Sigh. A monochrome bitmap means 1 bit per pixel. So no variations.
How does that 1 bit tell you which monochrome color?
Before I decide you are just making stuff up, tell me what color editing software I can go to to see this 1 bit per pixel in action.
Sigh again. Try Windows paint. Save as monochrome bitmap, like the original comment said. If you want, you can google how to read the metadata and look at it with a hex editor.
Go Nuts.
Stephen,
Monochrome 1bit resolution bitmaps are not grey scale. They are absolute black and white. The bit is 1 for black and 0 for white I believe in the standard BMP format (don’t quote me on this). There are no sub channels, no transparency, no RGB or other data. It’s strictly a grid of pixels either on or off. So no, there’s no contrast because there’s no variation.
https://docs.microsoft.com/en-us/previous-versions/ms969901(v=msdn.10)?redirectedfrom=MSDN
Read the part on 1 bit color resolution.
Ikepska, thank you for that, and yes I am aware. But Google the Windows Paint application mentioned by TwelveInch. Look at various results. Do you see examples that produce text you can present in a legal brief? What I mainly find in the Google results are complaints, saying that the process does not work for text. To understand why that might be, Google for more info regarding my previous remark about hints and anti-aliasing in text.
Note also, a grey scale image is monochrome, but not the only monochrome way to deliver a photographic image in any particular color. A half-tone image, which simulates grey-scale by varying dot sizes in a screen pattern, is a time-honored printer’s method for producing photographs using only a single ink, and thus in principle uses 1 bit analog color.
And of course a laser printer delivers something akin to a high-resolution bitmap. But in practice, that comparison is more confusing than useful. There is other stuff going on out of sight to make that illusion work.
…if you’re trying to preserve the nation’s nuclear secrets. I assure you that when we’re redacting documents in litigation, marker+scan is quite sufficient. (Though properly redacting in Acrobat is perfectly fine.)
It’s as if highlighting isn’t the same as redaction.
Maybe a better solution is to actually delete or replace the text?
Deleting is what’s done at the ITC, although by default – and opposite of the district court – the bulk of filings are made under seal anyway, so redactions aren’t an immediate concern in most cases.
Anyway, this (redaction fail described in the posting) is a recurring problem that’s been around for years. See, e.g., https://slashdot.org/story/06/06/22/138210/more-pdf-blackout-follies.
For those who care to RTFM, the ABA and ED Cal (among many others, no doubt), have useful guidance on how — and how not — to redact. See http://www.abajournal.com/news/article/paul-manaforts-attorneys-failed-at-redacting-learn-how-to-do-it-right; http://www.caed.uscourts.gov/caednew/index.cfm/cmecf-e-filing/redaction-requirements/how-to-redact/.
Relatedly, both guides also helpfully advise on scrubbing metadata from documents, another area where lawyers frequently blunder. At least for metadata, there are now vendor tools available that will automatically scrub metadata on documents sent via email.
Why not just delete the text and put in ellipses?
When you file a publicly accessible redacted version alongside the sealed unredacted version, you’re generally supposed to make sure that all the pages are identical except for the redactions. That’s hard to arrange if you’re deleting text and inserting ellipses in a word processor, because it throws off the spacing (and often the pagination more generally).
Prof. Volokh, that is only true if you are using a proportional font, as opposed to a monospace font. If you require the results you describe, you should be using a monospace font.
To redact, one could replace characters with the block character, ASCII code 219. In a monspace font, this will not affect line length, pagination, etc.
However, beware! Many (most?) word processor apps, like google docs, are journaling, and allow one to view previous versions of the document. You must make sure previous versions are eradicated.
Lesson: get an app that is certified for this purpose.
I posit that lawyers, and others responsible for handling secured or sensitive information who are casually taking a DIY approach are committing malpractice. (This includes not only redaction, but email, too.) If they undertake to handle such information professionally, they are professionally obligated to know what they are doing and how to do it. This can be accomplished, in part, by continuing education. There are plenty of resources available.
ThePublius: Most courts these days (though not all), in my experience, expect briefs to be prepared using a proportional font. I can’t vouch for whether they require this, but that seems to be the custom; and proportionally spaced text is generally easier to read, especially for readers who are used to it. So there’s excellent reason to write one’s briefs using a proportionally spaced font.
Then the question is what to do for the redacted public copies of those few briefs that do need to be redacted. My sense is that using software that does real redaction (such as Adobe Acrobat) is on balance better than writing the briefs using a harder-to-read font, just so one can redact using a word processor.
Prof. Volokh, I had no idea about courts’ font requirements! I believe you, but can you please give an example? Do they vary? Anyone requiring Comic Sans?
BTW, I don’t think proportional fonts are any easier to read than monospace. People were fine with monospace for centuries. More recently, think typewriters. Also, think text-only displays, computer source code, etc., etc.
I did some research. Wow, had no idea courts were so specific about fonts, and so inconsistent!
So, professional-grade, verified redaction tools are in order, in my opinion. Just as a lawyer who represents himself has a fool for a client, a lawyer who redacts for himself has a fool for an information security officer – or something like that.
I wonder if print.save_as_pdf would work?
If the document is already in a monospace font, you might be able to get away with that approach to redaction (though I would strongly recommend against it for multiple reasons). Changing the font just so you could redact it, however, would clearly be against the rules and could invalidate your production.
Wait. Why would a lawyer be using Google Docs for confidential information? Wouldn’t sharing said information with a third party not bound by the same privilege rules (ie Google) be a no-no?
Lawyers shouldn’t be using google docs, there is no native doc encryption, and they are stored in the open on google’s servers.
Lawyers should use end-to-end encryption, for virtually all client and client-related communication.
I know they don’t. It’s a bad situation.
Ty, if the lawyer hires a reputable vendor who is bound by contract to protect the information to the same levels that the lawyer is, then the vendor “inherits” the privilege obligations. That’s why you can’t simply subpoena the expert witness hired by a lawyer as a way to get around the lawyer’s privilege.
That said, I don’t personally think that GoogleDocs can or has any intention of living up to the necessary levels of privacy and security. If I ever found out my lawyer was using GoogleDocs, I’d fire him the same day.
This really sounds like a problem lawyers inflicted on themselves. First and foremost, because they can’t trust each-other to not try to circumvent good faith (if poorly executed) redaction attempts.
And then because of other self-imposed requirements and expectations, they have no easy ways out, leading to needing third-party solutions.
Well, at least somebody is getting paid.
EscherEnigma: Court documents are public records, accessible to anyone in the public — including the opposite parties (not just their lawyers) and the media. For the few not fully public documents, the point of redactions is precisely to have a version of the document that the public can access. So even if all lawyers completely refused to try to circumvent redaction attempts, redaction failures could still cause huge problem to the party whose secrets become accessible (and to the party’s lawyer).
This might alarm you, Mr. Volokh, but the poor behavior of others isn’t an excuse for one’s own poor behavior.