Digging a Memory Hole?
The lefty blogs are all atwitter over the observation that WhiteHouse.gov has recently begun preventing search engines from indexing and archiving pages in directories named "Iraq." There's speculation that this is a means of making it harder to spot retroactive changes to online documents (something that's apparently not uncommon). I'd thought there was a simpler way to do that if that's all they were after, but who knows. Either way, an interesting tidbit.
Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
In other news, tinfoil futures posted a sharp rise today.
That's one pretty clumsy robots.txt they have there. I wonder who gets the delightful job of maintaining it. It's probably some low-level flunky, but I can't shake the image of Condoleeza Rice manually duplicating each and every entry and adding "iraq" to the end of it.
--G
It would appear that "tinfoil" has replaced "abracadabra" among rightists looking for a magic word to make inconvenient facts disappear.
Yeah, tinfoil hat territory.
It's a fucking fact that anyone with an internet connection can see for themselves.
Disallow: /911/911day/iraq
Disallow: /911/iraq
Disallow: /911/patriotism/iraq
Disallow: /energy/iraq
etc.
It didn't happen by accident.
What possible defensible reason can anyone offer for it?
it is fascist to hide documents from google
FASCIST@!@!!
OK, R.C. Dean gave us the "tinfoil hat" line and now Anon@2:14 has mocked a strawman. What's next on the checklist?
JJ-
We still have to reopen the "Should we have invaded Iraq?" debate.
OK, I'm glad everybody who has never maintained a Web server has opined on the spooky significance of what they think they see.
However, as someone who writes robots.txt files regularly, this is pretty trivially obvious: they are blocking some bad robot that was sticking /iraq on the end of every request, probably attempting to scoop up damning evidence of administration policies in a really stupid way. I looked at several of the /iraq directories, and they returned 404s (that's file not found for the rest of you). The parent directories seem to exist, as do the text-only versions that are blocked.
Geez, that took me 5 minutes. It's amazing that nobody at Democrats.org or the one tin-foil blogger who boggled at a change in the robots.txt file on Sept. 1, 2001 and implied that this was some sort of scary evidence of foreknowledge (as opposed to, say, implementing a new CMS or something--that's "Content Management System," of which Moveable Type is an example, for the rest of you) didn't take the time to do such a basic check or think on the problem for longer than a second.
Joe, tin-foil is appropriate when ignoramuses are making up conspiracy theories about things that have very prosaic explanations that fit the facts as well or better. But then I'm part of the Majestic team sent here to discredit you and prevent you from finding out about the aliens.
You know, the ones in Wal-mart.
PS - to correct some misapprehensions I've seen in stories/comments about this: the robots.txt file does not prevent a spider, such as Google, from collecting info from pages with the word "iraq" on them. It only prevents them from going into directories named "iraq", which, as far as I can tell, don't exist in the first place.
This is such a non-story, and I come to this conclusion in spite of my support for Dean, primarily over the Iraq issue.
someone mentioning how clinton used to change files all the time and no liberals complained?
in all seriousness, what the hell is the payoff for something like this? unless there's a crafty master plan in action, it's just further proof what a bunch of fuckwits this group is.
This is NOT a tinfoil-hat situation. 'Tinfoil-hat type' refers to a person who is paranoid for no reason. (The phrase alludes to the classic paranoid delusion that someone is using a ray gun to control your thoughts, so if you line your hat with tinfoil your thoughts will remain secure.) If the government is rewriting history on a daily basis, I'd say that is grounds for justifiable paranoia. We need a new phrase, one used to connote righteous paranoia. Suggestions, anyone?
FWIW, another blogger has claimed that about 75 of the listed directories do, in fact, exist. I have no idea what the motivation for the change was, just thought it was worth noting since folk seem to be talking about it.
another blogger has claimed that about 75 of the listed directories do, in fact, exist
All of the ones ending in /text appear to exist. Didn't count, but that could easily be 75. They are the text-only versions of the parent directories.
I went through and tested as many as my patience could handle. You'll forgive me if I left out /tball/iraq, especially after /easter/iraq didn't exist.
Only two existed: /iraq (which defaults to /issues/iraq) and /issues/iraq.
Now, I can think of two explanations. 1) It is a clever ploy to ensure that no iraq materials get spidered if they are in a directory labelled /iraq (but not if they are under some other name, such as /powell-slides) with all the other entries in there to obfuscate those two that I mentioned. The fiends! 2) Some robot has been coming along and putting "/iraq" at the end of every directory it can find and they told a flunky to disallow all those /iraq requests that are flooding the error logs, and said flunky forgot to check for the two directories that actually end in /iraq.
Since everyone in government is a brilliant propagandist and devastatingly competent in their chosen profession, I can only conclude it was 1). After all, Amtrack runs perfectly, and the Post Office is a model of machine-like efficiency. Oh, and the military never hits targets it doesn't intend to.
Julian: not knocking you for noting the babble, just responding to said babble.
correction: that's not /issues/iraq but /infocus/iraq.
Robots, spiders, aliens, tinfoil hats? Must be halloween season or Ziggy Stardust is planning a reunion tour.
The tinfoil hat crack was addressed to the usual suspects who run every single belch and gurgle from the White House through their BUSH LIED!!! filter.
Claiming that your particular paranoia is justified (unlike all those OTHER people muttering to themselves in corners) really doesn't do much to convince those of us who have seen one conspiracy "scandal" after disappear into so much fluff.
If the government is rewriting history on a daily basis, I'd say that is grounds for justifiable paranoia.
Way to assume your conclusion, Jennifer.
Sandy;
I haven't maintained a web server, and I'm really curious what the reason whitehouse.gov's webmaster offers for this, but your explation doesn't make sense to me for a few reasons.
You write:
"they are blocking some bad robot that was sticking /iraq on the end of every request, probably attempting to scoop up damning evidence of administration policies in a really stupid way."
Okay, there are two lines that would be put in a robots.txt file to attempt to stop that bad robot:
user-agent: bad_robot
disallow: /
rather than adding hundreds of instances of "iraq" to the file one by one.
If you were to say "well maybe bad_robot won't pay attention to that." To that I would say "why would it pay attention to all the other robots.txt exclusions then?"
they could use some other means, such as mod_rewrite and .httaccess to deny the robot access to the server.
Again, that would mean a couple lines for one robot that doesn't exclude every other search engine from the directories that really do exist.
You also wrote:
"I looked at several of the /iraq directories, and they returned 404s (that's file not found for the rest of you)."
That means only that there is no index page in that directory -- Not that there are no pages at all in that directory. The web page writing about this says that there are many false/nonexistent directories (like the "barney" and "eggroll" ones), but that there are about 75 existent directories in that robots.txt file.
http://www.bway.net/~keith/whrobots/disdirs.html
Some of them really DO have index files and do not report 404s even from entering just with the directory URL, such as:
http://www.whitehouse.gov/infocus/iraq/photoessay/essay5/
http://www.whitehouse.gov/news/releases/2003/09/iraq/
http://www.whitehouse.gov/news/releases/2002/12/iraq/
http://www.whitehouse.gov/infocus/iraq/
http://www.whitehouse.gov/infocus/iraq/news/
etc. Those directories and others have index files, do not 404, and are excluded from search by all external engines obeying robots.txt.
Finally, returning to you writing "I looked at several of the /iraq directories, and they returned 404s (that's file not found for the rest of you)."
As I said that means only that there is no index.html file there. Not that the directory is empty. For example, this directory from the list linked to above returns a 404 error:
http://www.whitehouse.gov/infocus/iraq/100days
Good enough, there is no index file there.
but the directory has the following files in it:
http://www.whitehouse.gov/infocus/iraq/100days/100days.pdf
http://www.whitehouse.gov/infocus/iraq/100days/introduction.html
http://www.whitehouse.gov/infocus/iraq/100days/part1.html
http://www.whitehouse.gov/infocus/iraq/100days/part2.html
http://www.whitehouse.gov/infocus/iraq/100days/part3.html
http://www.whitehouse.gov/infocus/iraq/100days/part4.html
http://www.whitehouse.gov/infocus/iraq/100days/part5.html
http://www.whitehouse.gov/infocus/iraq/100days/part6.html
http://www.whitehouse.gov/infocus/iraq/100days/part7.html
http://www.whitehouse.gov/infocus/iraq/100days/part8.html
http://www.whitehouse.gov/infocus/iraq/100days/part9.html
http://www.whitehouse.gov/infocus/iraq/100days/part10.html
None of them 404 but they are in an excluded directory and are not crawled by external engines.
So that doesn't hold water to me either.
I'd be interested in the explanation offered by the webmaster.
K
R.C.:
Your comment "Way to justify your conclusion" seems to imply that paranoia would NOT be a justifiable response to what our government is doing these days. So how should one respond? I ask merely for information.
I have this mental picture of an ancient Roman in the last days of the Empire. Should this person get paranoid about the fact that Alaric the Goth is coming, or should he just get drunk and say "Wow, this decline and fall of an empire stuff sure is interesting?"
Actually, I prefer drunkenness, but that's because I have no children so I need not worry about what kind of country they'll be inheriting.
Jennifer, you are assuming that what the government is doing with this website is the nefarious rewriting is history, rather than engaging in routine website maintenance.
I agree that, IF the website is being reengineered to rewrite history, that is a bad thing. But that conclusion has not been proven, despite your willingness to believe it.
R. C.:
Of course I am willing to believe the worst of the current administration! Why should I not? Consider this: the other day my boyfriend and I were discussing a party we'd attended, and he made a misstatement, claiming that person A told him something when I know it was actually person B. Since my boyfriend is generally an honest man, I assumed this was a simple mistake on his part and thought no more of it.
However, one of my previous boyfriends was an habitual liar with the morality of a crack whore. If he had made the same person A/person B comment, I would have assumed he was a liar, because he gave me no reason to believe otherwise. It all boils down to credibility. What has this current administration done to deserve the benefit of the doubt?
R. C.:
Of course I am willing to believe the worst of the current administration! Why should I not? Consider this: the other day my boyfriend and I were discussing a party we'd attended, and he made a misstatement, claiming that person A told him something when I know it was actually person B. Since my boyfriend is generally an honest man, I assumed this was a simple mistake on his part and thought no more of it.
However, one of my previous boyfriends was an habitual liar with the morality of a crack whore. If he had made the same person A/person B comment, I would have assumed he was a liar, because he gave me no reason to believe otherwise. It all boils down to credibility. What has this current administration done to deserve the benefit of the doubt?
Whoops. Didn't mean to double-post that last one.
Assuming the Bush administration sucks may be a useful heuristic, but if you use that assumption in place of actual facts in reaching your conclusion, then you are back at the top of this page (tinfoil hat).
Jennifer,
Whereas the Clinton Administration...
OK, one clarification: putting something into the robots.txt file doesn't "block" anything. It's just a (strong) suggestion to indexing robots to not check those pages. Nothing nefarious there. Also, anyone looking to "rewrite history" would simply remove the pages. (Nice try, though, R.C. Dean.)
The most probable explanation come up with at *the* geek blog (slashdot.org) is that the WH wanted other sites to drift to the top of the search engine results; i.e., WH doesn't want to be seen as being the top search site for news/info. about Iraq.
Move along folks. Nothing to see here.
K:
I didn't say the whitehouse.gov webmaster was being smart about it, but that's what the attempt looks like to me. As the one blog you quote maintains, it looks like somebody took emacs or something and took every line in the previous file and added /iraq to it.
What do we know? We know that the file looks like /iraq was blindly added to every directory, no matter what. We also know there are Iraq-related materials that aren't in directories that end in /iraq.
You're right that several 'bots don't obey the Robot Exclusion Standard. However, there are commercial spiders that you can put on a Windows PC and spider a given site, and the user-agent isn't always the same. They do, however, not being open source, obey the RES. A couple of "hacktivists" could do such a thing, and disallowing user-agent: foo_bar wouldn't cut it.
But that is theoretical.
However, the "75 directories" aren't explicitly mentioned in the robots.txt file. It's linked from that page, take a look. In fact, one directory:
http://www.whitehouse.gov/news/releases/2003/02/iraq/powell-slides
is not actually blocked in the robots.txt file (it doesn't have anything handling the directory default, so it gets a 404).
But this is simple. If you're concerned about the memory hole, write a Perl script that will spider the site and doesn't obey the RES. If you do it infrequently enough and slow enough you'll slip under any monitoring stats they're running on their logs. Do it once every week or so, archive the copies, and voila, instant archive.
The point is, either way you cut it, this doesn't look like the work of evil geniuses. I think "mistake" takes precedence over "nefarious plot," even with this administration.
And, for the benefit of any white house flunky that reads this, here's the right way to kill bad bots:
http://diveintomark.org/archives/2003/02/26/how_to_block_spambots_ban_spybots_and_tell_unwanted_robots_to_go_to_hell
Now if you can think of a nefarious explanation that makes MORE sense than the "webmaster stupidity" hypothesis, I'm all ears. So far, all of the ones put forth are less likely and require even greater stupidity.
But hey, I thought the Amiga would catch on, too.
All right! It only took six minutes from Julian's post for the first "tinfoil hat" dismissal to pop up. How Pavlovian!
For all they dish out, bloggers sure can't take it...
I guess it takes one to know one eh?
LOL!
Hmmmm.... It's harder to dig up info that makes a politician look bad, and some people suspect it is deliberate and politically motivated.
Yeah, R.C. Dean, that's really X Files stuff, all right.
I point out that some people reflexively drool everytime someone rings the BUSH LIED!!! bell, and somehow that makes MY reaction "Pavlovian"?
kevin: its a conspiracy!!!!!!
EMAIL: krokodilgena1@yahoo.com
IP: 62.213.67.122
URL: http://www.PENIS-ENLARGER-PILLS.NET
DATE: 12/10/2003 09:17:35
Don't give up, you are close.
EMAIL: krokodilgena1@yahoo.com
IP: 62.213.67.122
URL: http://penis-exercise.nonstopsex.org
DATE: 12/20/2003 11:47:52
It is wise to apply the oil of refined politeness to the mechanisms of friendship.
EMAIL: pamela_woodlake@yahoo.com
IP: 68.173.7.113
URL: http://www.drugsexperts.com
DATE: 01/09/2004 11:40:35
Love can damage more than you can heal with drinking.
Total Screen Recorder, a great tool for making video tutorials:http://www.totalscreenrecorder.com
Total Screen Recorder, a great tool for making video tutorials:http://www.totalscreenrecorder.com
Total Screen Recorder, a great tool for making video tutorials:http://www.totalscreenrecorder.com
Total Screen Recorder could record the video as long as you wish. Not like the other screen recorder, which just let you record one minute or with large watermark on the video.
http://www.totalscreenrecorder.com/
Total Screen Recorder could record the video as long as you wish. Not like the other screen recorder, which just let you record one minute or with large watermark on the video.
http://www.totalscreenrecorder.com/
Total Screen Recorder could record the video as long as you wish. Not like the other screen recorder, which just let you record one minute or with large watermark on the video.
http://www.totalscreenrecorder.com
GodswMobile Software dedicated to providing a better experience life for the people who use and rely upon Microsoft Windows Mobile devices for their personal and commercial needs. With thousands of users over 50 countries worldwide, GodswMobile has become the popular, trusted and convenient choice to backup and restore the valuable information which stored in mobile phones.
http://www.godswmobile.com/