About Me

My photo
PhD Candidate at Purdue University, Computer Science.

Saturday, November 22, 2008

Archiving Scalability in NARA

Today I read this interesting article. It's a typical problem I was thinking about for a long time ago.
Archiving projects are the core of the Bibliotheca Alexandrina Strength, but so far we didn't face this huge amount of data described in the article.
Although we made archive for famous Egyptian figures like Boutros0Boutros Ghali, Gamal AbdElNasser, Anwar el-Sadat, and a collection of "History of Modern Egypt" noone of these collections exceeds some of hundreds of thousands of documents.
I have many thought about how can I deal with these amounts of data. the key is to design a scalable system based on clusters of data and not to wait for a president to conduct his heritage after termination. You should invest your time building a "generic" system capable of drinking whatever you throw to it given the key performance points you consider..
Is it surprising that you know that the president of USA will get billions of messages!
I would propose hadoop as a starting phase, a good step to start the cluster you want to create. But of course you may to ass some enhancements based on your system requirements.
This would be a great challenge if we come across such huge collection in some projects.

No comments: