Katherine Mangu-Ward | June 17, 2008
There's no shortage of data available to police,
meteorologists, and other soothsayers. But could there be a point
where more data means worse predictions? Yes,
sayeth camera skeptic and BoingBoing impresario Cory
Doctorow:
Take London: cover every square inch of the city with CCTVs and you'll get so much information that you'll never make any sense of it. Scotland Yard says that CCTVs help solve fewer than 3% of all crimes, while a study in San Francisco found that at best, criminals simply move out of camera range, while at worst they assume no one is watching.
Similarly, if you take fingerprints from every person who applies for a visa – or worse still, from every person in Britain who has to carry one of the proposed new biometric cards – you will fill the databases with chaff that slows down searches, generates endless false matches, and threatens everyone in the database with the worst kind of identity theft.
Check out Doctorow's recommendations for political books for young adults. And check out his book, Little Brother.
Help Reason celebrate its next 40 years. Donate Now!
Try Reason's award-winning print edition today! Your first issue is FREE if you are not completely satisfied.
In other words, getting data is never a bad thing (assuming the
data is accurate). You can always filter it if you need a more
manageable search. You lose information that way, but you're never
worse off than you would be if you didn't have any of this data at
all.
This is more of a case of technology not improving things as much
as was hoped. If all I was concerned with was raw crime solving
ability, I'd prefer having cameras everywhere to having cameras
nowhere. Even if I have to ignore all the camera video because it's
useless, I'm not any worse off than I would be if there were no
cameras.
Since you invoked meteorology, I have to throw in my two
cents...
I don't see any way in which "too much" data would make predictions
worse. To be sure, you can reach a point at which you simply cannot
use or compute all the data, but that doesn't negatively impact
your predictions; it just means you've wasted your time/resources
collecting excess data. You could also argue that an effort to
collect large amounts of data may compromise the quality of said
data, but that's not an inherent problem.
I guess ultimately I'm expanding on Chris's point: matching
suspects is different than making predictions. I suppose
technically, if your data resolution was greater than your actual
model, you'd have to use processing resources to interpolate a
dataset with matching resolution, but you already do that with data
sets that are too small. Bottom line is, if you're talking about
numerical predictions, the excess data is much more likely to
simply be ignored rather than clogging things up.
Even if I have to ignore all the camera video because it's
useless, I'm not any worse off than I would be if there were no
cameras.
You mean all that stuff costs $0.00?
Good point, Finger, but that doesn't mean that the data itself hampers efforts to fight crime, as KMW is arguing. But yes, the costs and threats to privacy do have to be considered in the final analysis.
Perhaps the data doesn't actually hurt the investigation, but it does seem that drinking from the firehose is unproductive compared with more intelligent approaches to, um, intelligence.
What are we talking about here? Initially Doctorow asks whether
or not too much data causes us to make worse predictions, but veers
off course to make the point that CCTV is ineffective in criminal
prosecution.
Leaving aside whether or not either of those statements happen to
be true and in what context, the entire post is a bit of
non-sequitur.
For a perspective on the history and cultural of data collection
try to find Ian Hacking's "Biopower and the Avalanche of Printed
Numbers".
Doctorow is (probably incorrectly) assuming that the cams are
designed with best intentions, rather than their probable goal of
just asserting authority over the proles and letting them know
they're being watched.
And, since this post extols Doctorow, let me offer an OT comment
about his fellow BB poster XeniJardin.
Here's a couple of my
comments she deleted from BoingBoing. That follows (and might
be related to) an incident from a few years ago when I added some
negative information to her WP page about some pictures she posted
to BB supposedly involved the MMP. But, one of them was an
inflammatory poster not from the MMP, and she failed to note
that.
That was eventually deleted from her WP entry, as you can read
about on this archived WP talk page:
tinyurl.com/54xdvn
Chris Potter: Doctorow was fairly on-target about the ways in
which too much data can be a bad thing: false positives and
increased handling time/costs.
Figuring out that false positives are false takes time and/or
money, and it's time and/or money that could have been spent doing
something productive in terms of solving the case. Similarly, if
each trawl through your database takes a half hour because it's a
few terabytes of data, you not only slow down investigations, you
probably limit the extent to which the data can be/will be
deployed. Nobody's going to check everyone getting onto an airplane
for being a terrorist if it takes 30 minutes (or, really, even 10
minutes) to check each person.
Now, of course, it's trivially easy to imagine scenarios in which
having more data helps solve cases, and, on balance, it probably is
true that they more often help than hurt in the narrow, focused
sense. I think that the arguments about privacy, abusability, etc.
are better than the "too much data hurts" one. But there is
something to that argument.
Doctorow is (probably incorrectly) assuming that the cams
are designed with best intentions, rather than their probable goal
of just asserting authority over the proles and letting them know
they're being watched.
In my (pretty large) city, there's a traffic intersection that has
SIXTEEN CAMERAS (4 pointing in each direction). There can't be
enough cops to watch 'em all, so the objective must be simple
intimidation -- or "deterrence" of whatever the poobahs think
should be deterred at that corner.
Chris Potter-
Do you think, in the final analysis, that Tony Blair and Gordon
Brown have given much consideration to privacy?
What right does the state have to view one's comings and goings? There are too many people, including, it would appear, far too many libertarians, who accept omnipresent cameras as inevitable.
I think Orwell might have preferred his own imagery of a boot
stomping on a human face.
Doesn't that get the point across better?
far too many libertarians, who accept omnipresent cameras as
inevitable.
I have no problem with everpresent cameras, in fact I
wholeheartedly approve of having a true record of what's happening
out in the public word. What's objectionable is the state owning
and operating them.
Let Little Brother watch with a thousand prying eyes, and train an
especially watchful glare on those entrusted with the state's
monopoly on violence.
liberty mike,
There is no expectation of privacy in public spaces. Nor should
there be. How is this any more of a privacy violation than posting
a policeman at every corner?
I keep trying to post a huge screed on why "more data is not
better"
It keeps getting sucked up in the intertubes. I think it's the
matrix fucking with me.
Short answer why "more data" approach isnt at all good for focused
problems - see will rogers quote here =
http://www.bobcongdon.net/blog/2004/06/boil-ocean.html
Doctorow was fairly on-target about the ways in which too
much data can be a bad thing: false positives and increased
handling time/costs.
That's not the fault of the data collection, it's the fault of a
particular way of using the collected data. If searches are taking
too long or turning up false positives, filter the data set, as I
suggested before. You'll still be better off than if you had no
data at all.
And Doctorow's quoted point had nothing to do with the costs, but
rather the claim that more data directly hampers
investigations.
Chris Potter | June 17, 2008, 8:14pm | #
Doctorow was fairly on-target about the ways in which too much data
can be a bad thing: false positives and increased handling
time/costs.
That's not the fault of the data collection, it's the fault of a
particular way of using the collected data.
You dont consider data collection a cost?
Or the time spend eliminating an unlimited amount of negatives, a
cost either?
What do yo do for a living?
maybe you're confusing "investigations" with "tangible
results".
I mean, it's great that they have 200billion hours of tv footage of
guys pissing in doorways, but it aint exactly doing anything about
it.
the panopticon effect is useless when you dont have a controlled
population. Soon, we'll need to facial-scan everyone to make sure
the SYSTEM works.
Chris Potter-
Posting a policeman at every corner is obnoxiously repugnant to a
free society. I have a right to not have my image and likeness
video-taped without my consent. I have a right to determnine who
will examine, use and/or distribute my image.
What is really lame is the proposition that one "consents" to
having his image videotaped, analyzed, used and/or distributed just
because he is locomoting on a public street. That is totalitarian
clap trap.
If the framers had intended to give the state the right to
establish permanent surveilance, they would have provided for such.
They did not.
Chris Potter-
This place was birthed by extremely radical folk who,
overwhelmingly, were adherents to natural rights philosophy.
Posting a cop at every corner is utterly inconsistent with natural
rights philosophy. Is your philosophy more appealing than that of
the framers?
GILMORE,
I, um, collect...uh, data.
But not that kind. ;-)
Listen, I'm down with the concern that the costs might be high
enough that it's not worth it. That's not what the claim was. Read
the friggin title of the post. KMW and Doctorow are saying that
collecting large amts of data inherently leads to bad results.
I have a right to not have my image and likeness video-taped
without my consent. I have a right to determnine who will examine,
use and/or distribute my image.
You have a right to life, bodily integrity, liberty, and the use of
your property. Where do these supposed rights fit in?
What is really lame is the proposition that one "consents" to
having his image videotaped, analyzed, used and/or distributed just
because he is locomoting on a public street. That is totalitarian
clap trap.
More like a necessary guiding principle for a high-tech society.
Are you saying that if a mom videotapes her family celebrating one
of the children's graduation in a public space, and you happen to
walk through the background, she has to track you down and ask your
permission before she can show the video to anyone?
If the framers had intended to give the state the right to
establish permanent surveilance, they would have provided for such.
They did not.
The Constitution doesn't enumerate powers for state govts, just the
federal one. State govts have all the powers that are not forbidden
to them in the US Constitution or in their own constitutions.
Chris-
It is axiomatic that the natural rights philosophy undergirded the
"american experiment in ordered liberty" and that "the language of
the Declaration of Independence provided the standard American
expression of that philosophy." Kimberly C. Shankman and Roger
Pilon, Revising the Privileges or Immunities Clause to redress the
Balance AMong States. Individuals and the Federal Government, 3
Texas Rev. of Law & Policy, 1, 12 (1998).
Posting a cop at every corner is utterly inconsistent with
natural rights philosophy.
How so? You keep claiming this, I don't find it convincing. In any
case, despite my religious views, I'm becoming more of a
utilitarian than a natural rights philosopher in legal matters
these days. Natural rights legalism leads to some pretty hideous
conclusions.
liberty mike,
The Founders weren't God. Nor is Ron Paul, but we've already
discussed that on other threads and you seem recalcitrant on that
point.
No offense to the framers, they did a good job for their time, but
I'm not going to surrender my will to wife-beating slaveholders any
time soon.
Chris -
No they do not.
1. Read the state constitutions.
2. Have you ever heard of the 9th amendment?
Do you understand natural rights philosophy? You do know that John
Adams was a self described natural rights adherent? Are you familar
with the writs of assitance cases argued by James Otis in 1761? Do
you know of the relationship between Otis and Adams?
Our rights do not come from the state or what some majority
ordains. They inhere; they are god given. That is what Mr. Adams
believed. Ditto Mr. Jefferson.
The natural rights philosophy undergirds the ninth amendment. Thus,
the question, where in the constitution does it say one has a right
of privacy is not the question as the framers conception of rights
included the proposition that the sum of of all of our rights could
never be catalogued -thus the ninth amendment. No 9th amendment =
no ratification = no USA.
Nor am I going to surrender my rights to pusillanimous pussies who want uniformed thugs on every corner.
Chris-
Of course they were not gods. As I have often been forced to admit,
they sure didn't always practice what they preached.
That's not the fault of the data collection, it's the fault
of a particular way of using the collected data. If searches are
taking too long or turning up false positives, filter the data set,
as I suggested before. You'll still be better off than if you had
no data at all.
If you could filter false positives out of the a data-set, you
wouldn't get false positives.
Your argument is, "If we ignore the cases where large data sets
cause bad results, large data sets cause only good results."
That's... true, I guess.
You still haven't shown that having a police officer* on every
corner conflicts with the Bill of Rights or natural rights theory,
or explained why, given that the Framers were not God, I should
care what their philosophy was.
* I mean of course an officer bound by the same laws as everyone
else, not the unaccountable gods-in-their-own-eyes that infest our
PDs and other LEAs today.
Not quite, Mr Sullivan. I was saying filter on some quicker basis, even just randomly throw out huge chunks if that's what you need to do to get the data set down to size. True, you might wind up throwing out data that would actually be helpful in the process, but in no case do you wind up worse off than you would if you had no data at all to work with.
Let me give an example: Detectives Smith and Jones are
old-school detectives who don't like all this new high-tech stuff.
They prefer the old-fashioned ways of questioning eyewitnesses and
sniffing around the crime scene after the fact. However, the mayor
decides to hedge his bets by investing in a camera system to
blanket public places and a geek to run facial recognition and
tracking software on the collected video, while still letting Smith
and Jones do their investigation their way.
Now, maybe the geek turns up false positives left and right, has to
throw out 3/4 of the video randomly to be able to search properly,
and runs up the electricity bill. But there's no way he actually
makes things worse than if there was just Smith and Jones doing
their detective work.
* I mean of course an officer bound by the same laws as
everyone else, not the unaccountable gods-in-their-own-eyes that
infest our PDs and other LEAs today.
Chris --
There's the rub. What the framers understood (better than most) is
the potential for abuse once the means of abuse are granted to the
government. There'd be no objection to cameras everwhere if we
could be absolutely assured that those videos would never be used
to further the personal political or military or power agenda of
any particular individual or group.
Fat chance.
CHris-
You do not have a right to impose your fear on me.
To your point, yes I have. Read the 9th amendment. It says "the
enumeration in the constitution of certain rights, shall not be
construed to deny or disparage others retained by the
people."
Our government is one of limited power or enumerated powers. There
is no grant of authority in the federal constitution or in any
constitution of the original 13 that enables government to maintain
permanent surveilance over the citizenry. Period.
On this point, I'll go with the framers over those that think they
or the majority have the right to impose their fear on the rest of
us.
Only a radical fearmongering pussy would argue that the state has a
right to maintain permanent surveilance on the citizenry. It is
entirely unreasonable and utterly at odds with first principles.
Oh, of course, those who stood to benefit by the permanent
surveilance state would be in its favor. Yes, my position is first
a moral one and superior to the "moral" position that some mob can
maintain a permanent surveilance state. Second, there is no
practical justification either-unless one believes that there is an
Atta in every attic waiting for the chance to to do some evil.
Chris-
But I forget-you are not a libertarian. If a libertarian does not
understand that libertarianism is rooted in natural rights
philosophy, then he is ready to be hannitized.
Crusader Rabbit-
I agree except that you are not speaking for me in your
hypothetical-as I do not consent to one pence of what I produce
being confiscated for rent seekers, state actors or other
parasites.
This point has probably already been made, and doesn't touch on the implications to liberty, but there's a difference between data flow and data amount. (namely the first derivative) The rate is what really matters; this is why there is an optimum amount of gauges on a car's dashboard. As another example, it did not matter that there were about 1000 different alarms on the control panel at three mile island. What mattered is that at any given time, about a third of them were locked in (i.e. continuously in an alarming condition but silenced). So a true problem was difficult to ascertain.
Our government is one of limited power or enumerated
powers.
The federal govt, yes.
The state govts, no.
Does your state's constitution specifically grant it power to set
speed limits and parking rules? And while you are doing a great job
of repeating that surveillance violates your natural rights, you
have given no evidence for this viewpoint.
But I forget-you are not a libertarian
liberty mike, you must really enjoy having a drink!
Now that small remote control helicopters are available in most toy stores its only matter of time before they can be equipped with paint-ball guns that can be aimed at surveillance camera lenses.
In my (pretty large) city, there's a traffic intersection
that has SIXTEEN CAMERAS (4 pointing in each direction). There
can't be enough cops to watch 'em all, so the objective must be
simple intimidation -- or "deterrence" of whatever the poobahs
think should be deterred at that corner.
Any idea if they are police- or transportation-run? Our local
intersections often have four cameras, which are used to
automatically adjust traffic light timing. Those are not "real"
cameras, in the sense that they do not transmit images of the
vehicles elsewhere. There are other spots where there are
transportation cameras hooked up for remote viewing to watch for
crashes, traffic backups, etc. I think there is even a website from
our state DoT.
Or maybe your state has red-light cameras that send automatic
tickets.
Zubon --
These are so-called red light cameras, with elaborate flash and
other accoutrements to go with them. Lots of cool (and expensive)
equipment that no doubt got the juices of the city procurement
folks flowing.
But I assume the police can use them for other purposes, too. Once
the image is captured, imagination is the only limit on what it can
be used for.
yes, and no.
if you're trying to do an analysis that for example needs to draw
some conclusion from all, or a combination of this data, then yes
more data is more problems. usually some methods are needed to try
to achieve a more useful set of data.
if you're just monitoring, recording, or archiving for retrival
when necessary, all you need is more resources.
.. so in conclusion, yes you really cant gather a ton of data and think you can extract something useful with it automatically.
It has been my experience that most people inadequately
appreciate the importance of sensory and perceptual adaptations to
environment in creating a successful species. That is to say,
species that are dominant in their environments appear to have
evolved sensory apparatus that is well-tailored for the environment
and the species' role in it. And, as much as sensory mechanisms
provide ways to experience the world, they also entail built-in
filters that shape the organism's experience, sieving out
irrelevant or distracting data points, even before the brain can
try to make sense of them. The basic sensory limitations that an
individual has have PROVEN, through evolution, to admit just enough
information for the success of the species, and to reject the rest
so as not to overtax (or confuse) the brain's ability to discern
patterns in the input.
I observe, on the other hand, that classical paranoids seem overly
preoccupied with very tiny details, picked from a broad canvas of
as much information as they can hold in their heads at one time.
Perhaps a touch of this is conducive to survival, but I can easily
imagine a situation of "too much" information, in which the
paranoid mind connects many dots he might otherwise ignore, to
create "patterns" where there really are none. The worldview caused
by recognition of such bogus patterns would lead to behavior that
were inappropriate and counterproductive for the envrionment,
probably to the point of negatively affecting the individual's
chances at survival or reproduction.
It is not much of a stretch to imagine that a situation of "too
much information," which is potentially bad for individuals and
species might also be bad for their organizations and institutions,
as well.
Once the work of Jeff Hawkins and associates
(http://www.numenta.com/, http://www.onintelligence.com/) attains
some maturity, and artificial intelligences in the mold he
presecribes are hooked into camera networks, for example, it will
be interesting to see which cameras need to be enhanced (and how),
and which need to be turned off entirely, to get optimum
pattern-recognition results. I think we will see that "legitimate"
peacekeeping and public-safety functions will require only a few
cameras, and that too many will simply serve to confuse or overload
the AI.
"behavior that were inappropriate and counterproductive FOR the envrionment" should be "... IN the environment..."
If searches are taking too long or turning up false
positives, filter the data set,
You're presuming the data can be meaningfully filtered.
For instance, ATF records instances when they trace the ownership
of a firearm. Apparently the individual records do not contain a
data field for the reason a law enforcement agency requested the
firearm be traced. Therefore it's impossible to filter either for
or against "the firearm was used in a crime" or "the firearm was
found somewhere" or "the firearm was unusual" or "it was a slow day
and we had nothing else to do."
Filters also take some time to run. If, for instance, you take a
database that stores the images of bullets for comparison and
expand it from 500,000 bullets found at crime scenes (each with six
images to store) and expand it to add 200,000,000 bullets from
legally-owned firearms just filtering out the 99.75% unlikely
matches could get to be at least enough of a problem to reconsider
adding the extra data.
Not to mention funding the data collection and storage space.
Now, maybe the geek turns up false positives left and right,
has to throw out 3/4 of the video randomly to be able to search
properly, and runs up the electricity bill. But there's no way he
actually makes things worse than if there was just Smith and Jones
doing their detective work.
Unless the mayor orders Smith and Jones to get back to the office
and watch the video.
Not quite, Mr Sullivan. I was saying filter on some quicker
basis, even just randomly throw out huge chunks if that's what you
need to do to get the data set down to size.
The point is, you don't know when you need to do this until you
know if you've gotten false positives. And you don't know if a
positive is false until you've already spent resources following up
on it.
Sure, it might be that you can make some reasonable guesses. If you
run a search against a national database for "Michael Sullivan,"
you'll get thousands of results, and obviously in most
circumstances you could say, "Wait, clearly there are a ton of
false positives, here."
But that doesn't really save you. Okay, so you chop the data-set
down to the point where it's only 2 Michael Sullivans, and then you
use that data-set from then forward. But then John Smith shows up,
and as common a name as Michael Sullivan is, there are more John
Smiths, and you're back in false positive land.
And, in practice of course, that's not how it works. How it works
is I try to get on a flight or get a job, and someone asks a
national database, "Uh, hey, I've got a Michael Sullivan here. Is
he a terrorist?" And the database says, "Well, there's a Michael
Sullivan who's connected with the IRA. Cavity search the sucker! Or
don't give him a job!" Nobody's winnowing down the data-set because
the person making the search doesn't look and see, "Oh, hey, there
are thousands of Michael Sullivans in this data-set," they just see
a flag that says, "IRA."
In other words, getting data is never a bad thing (assuming the data is accurate). You can always filter it if you need a more manageable search. You lose information that way, but you're never worse off than you would be if you didn't have any of this data at all.
Well...sorta. Suppose you have a large database with say, 3,400,000
people in it. The probability of a false positive is 1 in 1.1
million. The probability that you'll get more than one match is
going to be about 0.81 or 81%. So if you do use such a database
then there will indeed be instances where you have false positives.
With things like DNA or finger prints you'll have problem since
only one of your multiple hits is the subject you really
want.
People often think that things like DNA evidence is "slam dunk"
evidence. In terms of indicating innocence its pretty good. In
terms of pointing towards guilt it can be highly misleading.
Consider the case of William Pucket, true case. They had a partial
DNA profile. Chances of the random person matching that profile, 1
in 1.1 million. The cops ran it through a database of 338,000 DNA
profiles. What is the probability you'll get one hit, irrespective
of that person actually being guilty or not--i.e. what is the
unconditional probability of a match? About 0.23 or 23%. In other
words, give me 100 such databases and 23 of them will spit out a
match. So given a match does that mean the person has to be guilty?
No. Do prosecutors tell jurors this kind of information? Not in
Puckett's case. Should they have? I'd say so.
Now, does it help you narrow down your list of suspects? Maybe.
Keep in mind you could just have gotten a hit via dumb luck. You
could be spending your time investigating an innocent person while
the guilty party gets away.
That's not the fault of the data collection, it's the fault of a particular way of using the collected data. If searches are taking too long or turning up false positives, filter the data set, as I suggested before. You'll still be better off than if you had no data at all.
Uhhhmmm, no. The issue of false positives isn't just about
filtering. Filtering just makes the database "smaller" thus
limiting the number of "trials" and thus reducing the chance of a
false positive, it does not eliminate it. And it depends on the
probabilities in question. Filtering can help, but it can't
eliminate the problem, and the larger the database, the greater the
chance of a false positive. Your solution of filtering is basically
saying, "Yeah you're right so make the database smaller." Granted
it is doing so in a logical manner, but you are basicaly conceding
the point.
And Doctorow's quoted point had nothing to do with the costs, but rather the claim that more data directly hampers investigations.
It can if you don't understand how to utilize the results and are
careful in your analysis. Most cops don't know crap about
probability on a basic level. Introduce things like conditional
probability and Bayes Theorem and 99.9% of them have probably just
fallen asleep. And prosecutors don't like the idea that their
preferred suspect isn't the guilty party due to some arcane theorem
in a text book somewhere.
And there are plenty of geeks to point out these problems with
current methods and current data bases. Guess what, police,
prosecutors, etc. don't like them. Why? I don't really know. Turf
protection, don't like some pencil necked geek telling them are
wrong. For whatever reason, when someone points out the problem,
they hunker down and insist they got the right guy. Maybe they
have, maybe they haven't.
Not quite, Mr Sullivan. I was saying filter on some quicker basis, even just randomly throw out huge chunks if that's what you need to do to get the data set down to size.
You do realize you could be throwing out the actual guilty party
while keeping nothing but the false positives. At this point you've
just made the case for the position you are trying to refute.
Site comments/questions:
Media Inquiries and Reprint Permissions:
(310) 367-6109
Editorial & Production Offices:
3415 S. Sepulveda Blvd.
Suite 400
Los Angeles, CA 90034
(310) 391-2245