Amateur Beats Gov't at Digitizing Newspapers: Tom Tryniski's Weird, Wonderful Website
A retiree with a scanner builds one of the world's largest historic newspaper sites while tax-funded projects stall.
One computer expert working alone has built a historic newspaper site that's orders of magnitude bigger and more popular than one created by a federal bureaucracy with millions of dollars to spend. Armed only with a few PCs and a cheap microfilm scanner, Tom Tryniski has played David to the Library of Congress' Goliath.
Tryniski's site, which he created in his living room in upstate New York, has grown into one of the largest historic newspaper databases in the world, with 22 million newspaper pages. By contrast, the Library of Congress' historic newspaper site, Chronicling America, has 5 million newspaper pages on its site while costing taxpayers about $3 per page.[*] In January, visitors to Fultonhistory.com accessed just over 6 million pages while Chronicling America pulled fewer than 3 million views.
Fourteen years ago, Tryniski, a retired engineer, launched his website after a friend loaned him a collection of old postcards of Fulton, New York, the town where he's lived all his life. He decided to scan and share them online with his neighbors.
Fulton fell on hard times in the mid-1970s, and Tryniski is nostalgic for the thriving factory town in which he grew up. He relishes in particular the small details of life in old Fulton—wedding announcements, obituaries, school events, society gossip—the sort of information that's the bread and butter of local newspapers.
So after the postcards, he digitized the entire run of the Oswego Valley News, which is the paper of record for Fulton and its surrounding county. It took about a year to finish scanning by hand the entire run of the paper, which began publishing in 1946.
Fultonhistory.com really got going in 2003, when Tryniski, a high school graduate, bought a scanner that handles microfilm for $3500 in a fire sale. That meant he didn't need access to the original newspaper copies and he could work quickly because microfilm scanners are largely automated. He installed a keyword recognition program, set up a network of PCs to do the heavy processing, and began uploading his scans to a server that's located in a gazebo on his front deck. He never bothered to change the original name of his website.
Tryniski pays all expenses for the site himself. The only significant costs are bandwidth, for which he pays $630 per month, and hard drives, which run him about $200 per month. He gets his microfilm at no cost from small libraries and historical societies. In exchange, he gives them a copy of all the scanned images analyzed for keyword recognition. Most of the papers Tryniski has digitized are from
New York, but he's rapidly expanding his coverage to other states as well. He is adding new content at a rate of about a quarter-million pages per month with no plans to slow down.
The biggest digital newspaper site on the Internet is the for-profit Newspaperarchive.com, with 130 million pages. Newspapers.com, a subsidiary of genealogy-titan Ancestry.com, has 34 million newspaper pages. Both companies have approached Tryniski with partnership deals, but he turned them down in order to keep his site free.
"I think it's just fascinating that technology has made it possible for a guy in a house with a server to create a pretty cool experience," says Brian Hansen, the general manager of Ancestry's Newspapers.com.
Chronicling America, the Library of Congress site, is financed by the National Endowment for the Humanities (NEH). To date, the NEH has spent just over $22 million on the site. A major reason for the sky-high price tag is that the NEH breaks up the money into tiny grants to individual libraries and historical societies, instead of simply paying the Library of Congress directly to complete the job. So far, the NEH has awarded 72 grants worth about $300,000 each. Each award recipient is responsible for digitizing about 100,000 newspaper pages. The majority of grant recipients hire a company called iArchives, Inc., a subsidiary of Ancestry.com, to do the actual scanning and analysis.
Hansen, who is also the general manager of iArchives, Inc., says if the Library of Congress hired his company to do the job in bulk he could offer a better rate.
Asked for the rationale behind this byzantine system, a spokesperson for the NEH denied that breaking up the funding into small grants drives up costs, adding that the goal is partially to teach small libraries how to digitize newspapers in accordance with the Library of Congress' "high technical" standards. That way they'll be able to take that know-how and apply it to other projects.
But Hansen says the Library of Congress' detailed specifications for analyzing each newspaper page are of questionable value to users and a major reason his firm has to charge so much.
"Why not use the money for a lighter index to get more pages online? It would be interesting to sit down with the Library of Congress and the NEH and have a conversation about what's the best thing we can do for consumers," says Hansen.
Even so, less than one-third of the funding goes to the actual scanning and indexing by firms like iArchives. The NEH says the remaining money—more than $2 per newspaper page— goes for "identification and selection of the files to be digitized, metadata creation, cataloguing, reviewing files for quality control, and scholarship on the scope, content and significance on each digitized newspaper title, and in some cases specialized language expertise."
Another competitor of Tryniski's is the Brooklyn Public Library, which maintains a free online database of the Brooklyn Daily Eagle. In its heyday as the paper of record for America's third largest city, the Daily Eagle was among the most widely read and influential newspapers in the nation. Poet Walt Whitman wrote more than 800 items for the paper and served as its editor from 1846 to 1848. During the Civil War, the Daily Eagle had the largest circulation of any evening paper in the United States.
The Brooklyn Public Library spent two years and about $400,000 dollars digitizing just the first 62 years of the Daily Eagle's run, which comes to about 150,000 pages. (A little more than half the funding was provided by a federal grant.) That was back in 2003. For the last decade, the library has been trying to raise money to finish the job.
In the meantime, Tom Tryniski digitized the entire 115-year run of the newspaper, which amounts to almost 750,000 pages.
In its January 2013 strategic plan, the Brooklyn Public Library promises that it will finish digitizing the Daily Eagle (along with 63 Brooklyn community newspapers) by 2015. In an interview, Library Director Richard Reyes-Gavilan said the institution hasn't raised any money yet towards that goal, but that it will be a major priority once other "monstrous projects" are out of the way.
Librarian Joy Holland, who oversees the Brooklyn Public Library's Daily Eagle site, says she's "immensely grateful" for what Tryniski has done, and directs researchers to the site all the time. She also thinks the library's site, while far more limited is scope, is "more suitable for use in educational environments."
The site has fewer keyword recognition errors than Fultonhistory.com, and for casual users, it's easier to search. Fultonhistory.com also has a bizarre interface that includes swimming fish and the occasional live video stream of squirrels eating corn on Tryniski's front deck. Perhaps the strangest detail is a moving graphic in the left hand corner of the screen that shows Tryniski's head grafted on top of the body of a spider.
Tryniski, who has never altered the site's original graphic design, says he's emphasizing content over style.
"I could spend all my time on the interface, or I could spend my time on the digitization and data processing," says Tryniski. "Once you hit the search button the interface disappears and you get to see the newspapers."
Tryniski's had discussions with the New York State Library to donate his archive eventually, but talks have stalled. He gets emails all the time from users sharing discoveries they've made on his site and thanking him for making it all possible.
"I just get a lot of satisfaction helping people find information," says Tryniski. "It's just really nice looking back in time and reading about what was going on."
Video written, produced, shot and edited by Jim Epstein, who also narrates.
Approximately 5 minutes.
Scroll down for downloadable versions and subscribe to Reason TV's YouTube Channel to receive automatic updates when new material goes live.
[*]: About 2 million pages that the NEH has awarded grants for haven't yet made it online, so the $3 per page estimate was arrived at by dividing $22 million (total grant funding as of 2012) by 7,271,000 pages (total paid for by grants as of 2012).
Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
which he created in his living room in upstate New York
Because, with the internet, that matters at all.
Well, the fact that he has managed to survive in the hellish post-apocalyptic landscape of upstate New York and avoid being killed in a crossfire by Bloomberg's private army is commendable.
Fair Enough.
a friend loaned him a collection of old postcards of Fulton, New York, the town where he's lived all his life. He decided to scan and share them online with his neighbors.
How many separate copyright violations is that?
Depends on how old they were.
This must be a hoax. Nobody could do this but the government.
Not legally, at least.
Somebody ought to inform the post office! When they're done suing Lance Armstrong, they can get the rest of their budget shortfall from this guy.
"Why not use the money for a lighter index to get more pages online? It would be interesting to sit down with the Library of Congress and the NEH and have a conversation about what's the best thing we can do for consumers,"
Oh, I'm sure it would be quite interesting.
you beat me to it
him: "What would be the best thing we can do for consumers?"
them: "What is this 'consumer' you keep mentioning? Sounds like right-wing extremist talk. You stay right there, sir, until the Homeland Security officer gets here."
But the government can still build ROADZ because, well, the market is not willing to do that. Everybody knows that.
Yeah! He didn't build that!
Psst, hey! Hansen! This is government we're talking about here. Government. There are no consumers to speak about, just saps that pay for it.
I'm sure that the people who build the website are not allowed to even consider interaction with the people who actually use it.
My guess is that there is a special government committee that meets once a month to decide what to add to the site. All work must be authorized by the committee. Any work that is not authorized must be undone. So those millions of dollars go to pay people to sit around and wait for the committee to give them something to do.
At least that's how my "job" works anyway.
I've experienced this in the business world. Sometimes when you have really smart people who have nobody in authority over them they can't get out of their own way. One of the owners of my last employer was just such a man. He was so smart that he kept thinking up new hypothetical problems and then new solutions for the hypothetical problem to add to our development projects.
The net result was that the requirements kept changing right before we were ready to deploy, so nothing ever got deployed. Once he changed our scope every week for 60 consecutive weeks, then complained loudly to the board that we hadn't finished his project in over a year. So I trotted out 60 signed off work orders with changes of scope. He and the board agreed that they didn't want excuses, they expected results.
And now you know why they pissed away a billion dollar company. And you also know where I learned my new business strategy - never work for crazy people.
Yes, but his website's layout is god awful and I mean the worst.
The intro screen is silly, yes -- but the search results page works just fine. It's a bit ugly, but perfectly functional.
This is literally the worst website I have ever look at or tried to use.
In the interest of accuracy, the site is not "orders of magnitude bigger and more popular".
An order of magnitude is a factor of 10. Two orders of magnitude is a factor of 100.
It looks like this site is about four times bigger and twice as popular.
In general, orders of magnitude are measured in powers of 10.
But powers of 2 can be used.
So can 31.6 (Richter scale)
So can 2.5 (stellar magnitudes)
Gaaaahhhh!!! My eyes!!! It's missing an "Under Construction" GIF and a flying toaster or two. The fish are a nice touch.
Yeah, cool graphics are what we want. Forget about information.
I wish someone he could get the funding to get a decent search function and the bandwidth to go with higher resolution images. Some nineteenth-century newspapers routinely set entire pages in 5-to-6-point type and his scans of those are too low-res to read.
I've used this site many times in the past and search and navigation are really difficult. It's a great resource but painful to look at and hard to browse through.
But that said, he has content that is simply inaccessible anywhere else and he's making it available free of charge. So my hat is off to him. And even if it has problems, I'm not convinced a well-funded government-run alternative would actually look or function any better.
The proof is in the pudding. A very well funded government alternative isn't anywhere near as good.
But I'd be that a reasonably funded private alternative that was designed to serve a specific purpose, rather than simply an archive of some sort, would fucking rule.
What this site needs is an editor (think digital editions of manuscripts and the like - they are useful not because they contain pictures of said manuscripts, but because they are edited to serve a specific function).
How does Reason think up the subjects for their video? I could never have come up with the idea to do a video on this guy...
my buddy's mother makes $81 every hour on the laptop. She has been without a job for 6 months but last month her check was $20466 just working on the laptop for a few hours. Read more on this web site jump15.com
my neighbor's aunt makes $87 every hour on the computer. She has been without work for eight months but last month her pay was $13473 just working on the computer for a few hours. Go to this web site and read more http://www.wow92.com
Lauren. although Johnny`s comment is neat... last monday I got a gorgeous Lancia after having made $8137 this-past/month an would you believe $10 thousand this past-month. this is certainly the most financially rewarding I've ever done. I started this seven months/ago and practically straight away made more than $75, per hour. I use this web-site,
http://qr.net/ka6n
ProQuest has 2.2 billion pages in microfilm and 125 billion digital pages going back 500 years. So, the numbers in this article are chump change.
http://www.proquest.com/
Unfortunately, the company I work for was too stupid to leverage this and the industrial scanners made by another division. So our moron former CEO tried to split off that part of the company and failed spectacularly. They are lucky to still be in business.
If you think Marjorie`s story is super..., 5 weeks ago my brothers friend who's a single mum basically got $4440 workin a thirteen hour week from home and there roomate's sister-in-law`s neighbour has been doing this for nine months and earnt more than $4440 part-time On there computer. apply the steps on this web-site...
http://fly38.com
my classmate's sister makes $89 an hour on the laptop. She has been out of work for 9 months but last month her paycheck was $21878 just working on the laptop for a few hours. Here's the site to read more
http://jump30.com
"Fultonhistory.com really got going in 2003, when Tryniski, a high school graduate, bought a scanner that handles microfilm for $3500 in a fire sale. That meant he didn't need access to the original newspaper copies and he could work quickly because microfilm scanners are largely automated."
So, buying a scanner at a fire sale means the original newspapers appeared out of nowhere? Missing information alert! Where did he get the images, and if he doesn't need the actual newspapers, what, exactly, is he scanning?
(Someone, please fill me in. I am recalling a Games article from the early 80's that carried on about the lost art of sign painting, but nowhere did the article mention if they were metal signs to be attached to posts, or signs painted directly onto brick walls, or signs painted onto awnings, etc. )