The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
New Draft Article: "Data Scanning and the Fourth Amendment"
Just posted to SSRN.
I have recently posted a new draft article, Data Scanning and the Fourth Amendment, to SSRN. The abstract:
A crucial question of Fourth Amendment law has recently divided courts: When government agents conduct a digital scan through a massive database, how much of a "search" occurs? The issue pops up in contexts ranging from geofence warrants and reverse keyword searches to the installation of Internet pen registers. When a government agent runs a filter through a massive database, resulting in a list of hits, is the scale of the search determined by the size of the database, the filter setting, or the filter output? Fourth Amendment law is closely attuned to the scale of a search. No search means no Fourth Amendment oversight, small searches ordinarily require warrants, and limitless searches are categorically unconstitutional. But how broad is a data scan?
This essay argues that that Fourth Amendment implications of data scans should be measured primarily by filter settings. Whether a search occurs, and how far it extends, should be based on what information is exposed to human observation. This standard demands a contextual analysis of what the output reveals about the dataset based on the filter setting. Data that passes through a filter is searched or not searched depending on whether the filter is set to expose that specific information. The proper question is what information is expressly or implicitly exposed, not what raw data passes through the filter or the raw data output. The implications of this approach are then evaluated for a range of important applications, among them geofence warrants, reverse keyword searches, and Internet pen registers.
The idea for this article started with my blog posts here reacting to the Fifth Circuit's geofence warrant ruling in United States v. Smith, but I think the issue is one that applies more broadly. Indeed, the more that lower courts construe the Fourth Amendment broadly on what data is protected, the more Fourth Amendment protection depends on how you answer the scanning question.
This is a first draft, and comments are very welcome. I especially welcome comments on the technology discussions (mostly in Section I), including about whether I get the basics correct, whether the examples and analogies work, and whether the terminology is on or off. Thanks.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Thank you for this paper! My background is in technology, not the law. It was very interesting reading how the courts have wrangled with the issues of digital searches, and I agree it would be very beneficial to apply a single standard.
I very much like the idea of inspecting the filter/query terms to analyze whether or not a digital search reaches to the Fourth Amendment. However, I think that looking at filter terms alone is insufficient.
There really is no such thing as "unstructured" data. All digital information has some degree of structure, even simple text files, such as name, datetime stamp, file size, storage information, and so on. It is possible that any of this information could potentially be implicated as part of a 4A search I would imagine.
Technologists have distinguished between "unstructured" and "structured" data only to highlight that some applications expect their data to be queried by end users, while others only expect their data to be configured by them. Underneath it all, everything is data and everything is queryable.
So, to really determine whether or not a digital search is a 4A search, we need to know BOTH the fields that could potentially be inspected, and the proposed filter/query that will be run against them.
As to the distinction between "non-content" and "content", I think for 4A purposes, the question is whether or not any particular field being queried is public or private in the particular context of the search. It may not be that a field is always public, for example.
In any event, thank you for writing this paper, I am sure it will be very helpful.
"and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized"
How particular, exactly, is "give me all the cell phones in DC on Jan 6th?
I find this unconvincing.
Let us postulate a medical storeroom in which are stored lots and lots of bottles of pills (each made of opaque plastic, so you can't see the pills themselves.) Each bottle has a label describing its contents : "100 x 25mg fantastaglobin Y", "50 x 20mg suckastatin" etc.
You search first for bottles with labels mentioning "angeldustium", and then you open these and check the contents to see whether they really do contain "angeldustium."
To me, the claim that you have not searched the whole medical storeroom would be disingeuous.
If your warrant permitted you to search "such bottles in the storeroom as are marked "angeldustium" it would permit you to do the second part of the search - ie checking the bottle contents. But it would not permit you to do the first part of the search - the search of the storeroom to find the bottles marked "angeldustium."
You cannot do the second part of the search without doing the first part. And the first part is a search of the storeroom not a search of the bottles.
This was covered pretty well in the article, I thought. What constitutes a search is what is seen by the person requesting it -- the "output" of the search, if you will. So in your hypothetical, you ask for bottles labeled "X", and the pharmacist gives you bottles labeled "X" (if any). As long as the pharmacist doesn't give you any information about all the other bottles, YOU haven't searched them, the pharmacist has. And doing so doesn't infer any additional powers or controls to the pharmacist, he already needs to know all the possible information as part of the job.
What makes the pharmacist hand over the bottles marked X ? A passing whim ? An intuition that you might be interested ? Or an order from you. I think it's the order.
And if the pharmacist is acting under your orders, she's part of your search team.
I don't think the cops can get round the need for a warrant to search a house by paying a couple of boys to climb in and go through the house for drugs, money and guns, dumping what they find on the sidewalk. And then you, the cops, come along and say "wow, lookee here, drugs'nmoney'nguns all from No. 32 ! Who'da thunk it ?" The boys are your agents. They're on the search team.
I'm not a techie, but I don't believe that the computer / database "knows" which are the responsive records. I just think it can find them really quickly by performing its own search, either on the whole database or on an index. But, like the pharmacist and the boys, it searches at your direction.
The end point of Orin's theory is that law enforcement drones "this-is-not-a-search-because-I-am-a-drone-not-a-human" search whatever their human masters like, and deliver the goodies to their human masters, who say "Search ? What, me ? All my eyes have passed over is the stuff that friendly drone found. And btw if you say this is illegal, terrorists will immediately kill 500,000 people, so that's on you."
Like Lee I'm unpersuaded: If you search an entire hotel for a needle, it is not a small search just because the needle is small. You have searched the entire hotel. If you search one specific room for an elephant, it's a small search, even if the elephant is big. The size of the search is dictated by the size of the area searched, not of the article sought, because it's that former that dictates what else is present, and the concern here is intrusiveness.
But I think this elides the bigger issue with digital searches: The distinction between searching the data "in situ" and copying the database and searching the copy. It's quite common for the government to make a copy of a database, and then do their search on the copy. Think of the data FISA searches are conducted on, or the police searching somebody's phone.
In these cases, we basically have to take the government's word for the scope of the search actually conducted. And the very existence of the Bill of Rights is predicated on the government not actually being worthy of being trusted! That there are abuses that the government might be inclined to commit, that need to be guarded against.
Taking the government's word for something, like the scope of the search that will actually be conducted, is contrary to the whole spirit of the Bill of Rights. We really need better safeguards on digital searches to make sure the government only conducts the authorized search.
"That there are abuses that the government might be inclined to commit, that need to be guarded against."
And which entity decides if the govt has committed an abuse and then order the govt to cease what it's doing?
https://thehill.com/homenews/5192090-judge-finds-trump-unlawfully-fired-head-of-federal-employee-labor-board/
Letting the searcher get a complete copy of all the records to be searched without regard to any filter is precisely the kind of thing the author is arguing against. His argument is "show me your search parameters first", and then I can determine whether or not you are requesting private information protected under the 4th amendment.
I see your hotel and raise you a room. I think Orin shoots himself in the foot early on, thus :
Think of how a physical investigation works. When officers search a room for evidence, they move physical things to bring new things into their line of sight. Searching is exposure, enabling human senses to observe and human minds to recognize
But who would argue that the officers did not search the room ? Sure, while searching it they may have moved things, and in so doing spotted things that they wouldn't have spotted if they hadn't moved them. But it would be absurd to say that what they searched was limited to the things their eyes passed over. They searched the room.
The reference that Orin gives is to a judgement saying :
See Arizona v. Hicks, 480 U.S. 321, 325 (1987) (concluding that lifting stereo equipment in a way that revealed the serial numbers on the bottom was a search; “taking action, unrelated to the objectives of the authorized intrusion, which exposed to view
concealed portions of the apartment or its contents, did produce a new invasion of respondent's privacy”)
Which is correct. If the officers were not already searching the room, then lifting up the stereo equipment is a search of the stereo equipment. But if they're already searching the room the stereo equipment is already being searched merely by being in the room.
I'm a literal idiot, so I had copilot write this for me:
This paper provides a thought-provoking analysis of the Fourth Amendment implications of data scanning, but I find myself wrestling with an emerging concern it doesn't address: the role of artificial intelligence in searches. For instance, imagine law enforcement relying on an opaque, black-box AI model purchased from BigCop to analyze vast datasets and produce a list of likely suspects. Would this process trigger Fourth Amendment protections? If the underlying mechanisms of the AI are inscrutable—even to its operators—how do we assess the constitutionality of such searches? The filter-focused approach emphasized in the paper seems ill-equipped to handle the unique challenges posed by AI's opacity, potential biases, and wide-reaching inferences. As AI becomes increasingly central to investigative tools, omitting this dimension feels like a missed opportunity to future-proof Fourth Amendment jurisprudence.
Siri, who are the usual suspects?
Hypo 1: the police think Adam is up to something and get a warrant for his bank records. They go to the bank and a bank employee goes into a room with everyone's bank records and selects Adam's file and hands it over.
Hypo 2: same as #1 but the records are in a big data base and the bank employee does a query like 'select * from transactions where customer=Adam', yielding the same results as #1.
Hypo 3: an informant tells the police 'I don't know the name of the man who hired me for the hit, but I waited in the car while he got the $7500 from Capitol Bank at about 2PM Thursday. The police get a warrant and go to the bank and say 'tell us everyone who withdrew $7500 in cash on Thursday afternoon', which the bank does by searching 'select * from transactions where date=that Thursday and amount>7000 and amount <8000', because that's how their database works. Or even 3A, the bank employee just scrolls through all the cash withdrawals for Thu afternoon (because that's easier) until he sees $7500 and says 'your guy is Fred Smith'.
I'm not sure I see 3 or 3A as different from 1 or 2.
Does it/should it matter if the file cabinets for #1 aren't labeled, so the bank employee has to open every drawer looking for Adam's records?