Last year, Ronald Bowes talked about the privacy issues of Facebook Directory, which has links to all the public Facebook profiles. He scraped the contents of the directory and published the results (his post is Return of the Facebook Snatchers).
The post made me think, not about the privacy issues, but on how much information was there to process. How much time should take to scrape the Directory? How to store the urls?. Ron provided a Ruby script which reads a text file, with one url per line, the script downloads the document and extracts links to other pages on the directory. But’s there’s still missing some other script to iterate over these results and call the first script the continue downloading. And there’s still missing some code to know the first and last name of the user for certain url. So I decided to write my own scraper for Facebook Directory.



