One computer expert working alone has built a historic newspaper site that's orders of magnitude bigger and more popular than one created by a federal bureaucracy with millions of dollars to spend. Armed only with a few PCs and a cheap microfilm scanner, Tom Tryniski has played David to the Library of Congress’ Goliath.
Tryniski's site, which he created in his living room in upstate New York, has grown into one of the largest historic newspaper databases in the world, with 22 million newspaper pages. By contrast, the Library of Congress' historic newspaper site, Chronicling America, has 5 million newspaper pages on its site while costing taxpayers about $3 per page.[*] In January, visitors to Fultonhistory.com accessed just over 6 million pages while Chronicling America pulled fewer than 3 million views.
Fourteen years ago, Tryniski, a retired engineer, launched his website after a friend loaned him a collection of old postcards of Fulton, New York, the town where he's lived all his life. He decided to scan and share them online with his neighbors.
Fulton fell on hard times in the mid-1970s, and Tryniski is nostalgic for the thriving factory town in which he grew up. He relishes in particular the small details of life in old Fulton—wedding announcements, obituaries, school events, society gossip—the sort of information that's the bread and butter of local newspapers.Oswego Valley News, which is the paper of record for Fulton and its surrounding county. It took about a year to finish scanning by hand the entire run of the paper, which began publishing in 1946.So after the postcards, he digitized the entire run of the
Fultonhistory.com really got going in 2003, when Tryniski, a high school graduate, bought a scanner that handles microfilm for $3500 in a fire sale. That meant he didn't need access to the original newspaper copies and he could work quickly because microfilm scanners are largely automated. He installed a keyword recognition program, set up a network of PCs to do the heavy processing, and began uploading his scans to a server that's located in a gazebo on his front deck. He never bothered to change the original name of his website.
Tryniski pays all expenses for the site himself. The only significant costs are bandwidth, for which he pays $630 per month, and hard drives, which run him about $200 per month. He gets his microfilm at no cost from small libraries and historical societies. In exchange, he gives them a copy of all the scanned images analyzed for keyword recognition. Most of the papers Tryniski has digitized are fromNew York, but he’s rapidly expanding his coverage to other states as well. He is adding new content at a rate of about a quarter-million pages per month with no plans to slow down.
The biggest digital newspaper site on the Internet is the for-profit Newspaperarchive.com, with 130 million pages. Newspapers.com, a subsidiary of genealogy-titan Ancestry.com, has 34 million newspaper pages. Both companies have approached Tryniski with partnership deals, but he turned them down in order to keep his site free.
"I think it's just fascinating that technology has made it possible for a guy in a house with a server to create a pretty cool experience," says Brian Hansen, the general manager of Ancestry's Newspapers.com.
Chronicling America, the Library of Congress site, is financed by the National Endowment for the Humanities (NEH). To date, the NEH has spent just over $22 million on the site. A major reason for the sky-high price tag is that the NEH breaks up the money into tiny grants to individual libraries and historical societies, instead of simply paying the Library of Congress directly to complete the job. So far, the NEH has awarded 72 grants worth about $300,000 each. Each award recipient is responsible for digitizing about 100,000 newspaper pages. The majority of grant recipients hire a company called iArchives, Inc., a subsidiary of Ancestry.com, to do the actual scanning and analysis.
Hansen, who is also the general manager of iArchives, Inc., says if the Library of Congress hired his company to do the job in bulk he could offer a better rate.
Asked for the rationale behind this byzantine system, a spokesperson for the NEH denied that breaking up the funding into small grants drives up costs, adding that the goal is partially to teach small libraries how to digitize newspapers in accordance with the Library of Congress' "high technical" standards. That way they’ll be able to take that know-how and apply it to other projects.
But Hansen says the Library of Congress' detailed specifications for analyzing each newspaper page are of questionable value to users and a major reason his firm has to charge so much.
"Why not use the money for a lighter index to get more pages online? It would be interesting to sit down with the Library of Congress and the NEH and have a conversation about what's the best thing we can do for consumers," says Hansen.
Even so, less than one-third of the funding goes to the actual scanning and indexing by firms like iArchives. The NEH says the remaining money—more than $2 per newspaper page— goes for "identification and selection of the files to be digitized, metadata creation, cataloguing, reviewing files for quality control, and scholarship on the scope, content and significance on each digitized newspaper title, and in some cases specialized language expertise."
Another competitor of Tryniski's is the Brooklyn Public Library, which maintains a free online database of the Brooklyn Daily Eagle. In its heyday as the paper of record for America’s third largest city, the Daily Eagle was among the most widely read and influential newspapers in the nation. Poet Walt Whitman wrote more than 800 items for the paper and served as its editor from 1846 to 1848. During the Civil War, the Daily Eagle had the largest circulation of any evening paper in the United States.