Monday, December 20, 2010

Database of Google Books shows the half-life of celebrity

Is it possible to study something as ill-defined as culture in a quantitative manner? Researchers from Harvard have collaborated with Google and some traditional publishers to answer that question with a qualified "yes." By leveraging a portion of Google's massive library of digitized books, the team has created what they call a "culturome," with which they can track the use of language and terms across hundreds of years. This lets them track not only trends in language and usage, but the rise and fall of celebrities and historic events in the books of many eras. And, thanks to Google, the underlying data has been exposed via a Web interface, allowing others to perform their own analysis.

The authors didn't work with the full complement of Google's digitized texts, but the amount of material they did use is staggering: over 5 million books. They estimate that's about four percent of the books ever published. Google has about three times as many works scanned, but the scan quality and metadata on these—date and location of publication, etc.—isn't uniformly good, so the research has focused on the material with the best quality. Works start appearing at 1500, and include significant contributions in seven languages. They estimate that it would take someone 80 years to read it all, assuming said individual didn't eat or sleep.

Read the rest of this article...

Read the comments on this post


SALESFORCE COM SAIC ROCKWELL AUTOMATION RF MICRO DEVICES RED HAT

No comments:

Post a Comment