Amateur Beats Gov't at Digitizing Newspapers: Tom Tryniski's Weird, Wonderful Website
A retiree with a scanner builds one of the world's largest historic newspaper sites while tax-funded projects stall.
One computer expert working alone has built a historic newspaper site that's orders of magnitude bigger and more popular than one created by a federal bureaucracy with millions of dollars to spend. Armed only with a few PCs and a cheap microfilm scanner, Tom Tryniski has played David to the Library of Congress' Goliath.
Tryniski's site, which he created in his living room in upstate New York, has grown into one of the largest historic newspaper databases in the world, with 22 million newspaper pages. By contrast, the Library of Congress' historic newspaper site, Chronicling America, has 5 million newspaper pages on its site while costing taxpayers about $3 per page.[*] In January, visitors to Fultonhistory.com accessed just over 6 million pages while Chronicling America pulled fewer than 3 million views.
Fourteen years ago, Tryniski, a retired engineer, launched his website after a friend loaned him a collection of old postcards of Fulton, New York, the town where he's lived all his life. He decided to scan and share them online with his neighbors.
Fulton fell on hard times in the mid-1970s, and Tryniski is nostalgic for the thriving factory town in which he grew up. He relishes in particular the small details of life in old Fulton—wedding announcements, obituaries, school events, society gossip—the sort of information that's the bread and butter of local newspapers.
So after the postcards, he digitized the entire run of the Oswego Valley News, which is the paper of record for Fulton and its surrounding county. It took about a year to finish scanning by hand the entire run of the paper, which began publishing in 1946.
Fultonhistory.com really got going in 2003, when Tryniski, a high school graduate, bought a scanner that handles microfilm for $3500 in a fire sale. That meant he didn't need access to the original newspaper copies and he could work quickly because microfilm scanners are largely automated. He installed a keyword recognition program, set up a network of PCs to do the heavy processing, and began uploading his scans to a server that's located in a gazebo on his front deck. He never bothered to change the original name of his website.
Tryniski pays all expenses for the site himself. The only significant costs are bandwidth, for which he pays $630 per month, and hard drives, which run him about $200 per month. He gets his microfilm at no cost from small libraries and historical societies. In exchange, he gives them a copy of all the scanned images analyzed for keyword recognition. Most of the papers Tryniski has digitized are from
New York, but he's rapidly expanding his coverage to other states as well. He is adding new content at a rate of about a quarter-million pages per month with no plans to slow down.
The biggest digital newspaper site on the Internet is the for-profit Newspaperarchive.com, with 130 million pages. Newspapers.com, a subsidiary of genealogy-titan Ancestry.com, has 34 million newspaper pages. Both companies have approached Tryniski with partnership deals, but he turned them down in order to keep his site free.
"I think it's just fascinating that technology has made it possible for a guy in a house with a server to create a pretty cool experience," says Brian Hansen, the general manager of Ancestry's Newspapers.com.
Chronicling America, the Library of Congress site, is financed by the National Endowment for the Humanities (NEH). To date, the NEH has spent just over $22 million on the site. A major reason for the sky-high price tag is that the NEH breaks up the money into tiny grants to individual libraries and historical societies, instead of simply paying the Library of Congress directly to complete the job. So far, the NEH has awarded 72 grants worth about $300,000 each. Each award recipient is responsible for digitizing about 100,000 newspaper pages. The majority of grant recipients hire a company called iArchives, Inc., a subsidiary of Ancestry.com, to do the actual scanning and analysis.
Hansen, who is also the general manager of iArchives, Inc., says if the Library of Congress hired his company to do the job in bulk he could offer a better rate.
Asked for the rationale behind this byzantine system, a spokesperson for the NEH denied that breaking up the funding into small grants drives up costs, adding that the goal is partially to teach small libraries how to digitize newspapers in accordance with the Library of Congress' "high technical" standards. That way they'll be able to take that know-how and apply it to other projects.
But Hansen says the Library of Congress' detailed specifications for analyzing each newspaper page are of questionable value to users and a major reason his firm has to charge so much.
"Why not use the money for a lighter index to get more pages online? It would be interesting to sit down with the Library of Congress and the NEH and have a conversation about what's the best thing we can do for consumers," says Hansen.
Even so, less than one-third of the funding goes to the actual scanning and indexing by firms like iArchives. The NEH says the remaining money—more than $2 per newspaper page— goes for "identification and selection of the files to be digitized, metadata creation, cataloguing, reviewing files for quality control, and scholarship on the scope, content and significance on each digitized newspaper title, and in some cases specialized language expertise."
Another competitor of Tryniski's is the Brooklyn Public Library, which maintains a free online database of the Brooklyn Daily Eagle. In its heyday as the paper of record for America's third largest city, the Daily Eagle was among the most widely read and influential newspapers in the nation. Poet Walt Whitman wrote more than 800 items for the paper and served as its editor from 1846 to 1848. During the Civil War, the Daily Eagle had the largest circulation of any evening paper in the United States.
The Brooklyn Public Library spent two years and about $400,000 dollars digitizing just the first 62 years of the Daily Eagle's run, which comes to about 150,000 pages. (A little more than half the funding was provided by a federal grant.) That was back in 2003. For the last decade, the library has been trying to raise money to finish the job.
In the meantime, Tom Tryniski digitized the entire 115-year run of the newspaper, which amounts to almost 750,000 pages.
In its January 2013 strategic plan, the Brooklyn Public Library promises that it will finish digitizing the Daily Eagle (along with 63 Brooklyn community newspapers) by 2015. In an interview, Library Director Richard Reyes-Gavilan said the institution hasn't raised any money yet towards that goal, but that it will be a major priority once other "monstrous projects" are out of the way.
Librarian Joy Holland, who oversees the Brooklyn Public Library's Daily Eagle site, says she's "immensely grateful" for what Tryniski has done, and directs researchers to the site all the time. She also thinks the library's site, while far more limited is scope, is "more suitable for use in educational environments."
The site has fewer keyword recognition errors than Fultonhistory.com, and for casual users, it's easier to search. Fultonhistory.com also has a bizarre interface that includes swimming fish and the occasional live video stream of squirrels eating corn on Tryniski's front deck. Perhaps the strangest detail is a moving graphic in the left hand corner of the screen that shows Tryniski's head grafted on top of the body of a spider.
Tryniski, who has never altered the site's original graphic design, says he's emphasizing content over style.
"I could spend all my time on the interface, or I could spend my time on the digitization and data processing," says Tryniski. "Once you hit the search button the interface disappears and you get to see the newspapers."
Tryniski's had discussions with the New York State Library to donate his archive eventually, but talks have stalled. He gets emails all the time from users sharing discoveries they've made on his site and thanking him for making it all possible.
"I just get a lot of satisfaction helping people find information," says Tryniski. "It's just really nice looking back in time and reading about what was going on."
Video written, produced, shot and edited by Jim Epstein, who also narrates.
Approximately 5 minutes.
Scroll down for downloadable versions and subscribe to Reason TV's YouTube Channel to receive automatic updates when new material goes live.
[*]: About 2 million pages that the NEH has awarded grants for haven't yet made it online, so the $3 per page estimate was arrived at by dividing $22 million (total grant funding as of 2012) by 7,271,000 pages (total paid for by grants as of 2012).