How to scrape expired web 2.0s

Scraping domains is fine and dandy but they cost money to register and you really need to have multiple hosting accounts so they all get a different c-class ip. This kind of defeats the marketing on a budget part. So instead we are going to want to find web 2.0’s that are expired, deleted, or are free to register. Finding expired web 2.0’s is made easy with scrapebox’s addon: Vanity Name Checker. Unfortunately it only has a handful of web 2.0’s. This includes Tumblr, WordPress, BlogSpot, LiveJournal, WikiDot, OverBlog and some other ones that are not useful for finding expired domains. WordPress does not let you register previously used names and LiveJournal will cost you 15$ to rename your account with an expired one. This makes wordpress totally useless and LiveJournal can be a pain registering multiple accounts and then paying to change your name. That being said, I have found that it is easier to find some high PR LiveJournal sites than any other Web 2.0’s. I currently have a couple of LiveJournal sites with very high PR. The last thing to note before we get into the tutorial is that sometimes the PR of 2.0’s doesn’t always stick. I have noticed this mainly with tumblr but also occasionally on other sites to. When researching tumblr sites you need to look at the links. If you only see reblogged links or similar you will probably lose the authority. At least that’s what I have noticed. You want to find links that are not reblogged or likes.

Scrape those Expired Web 2.0’s

Scraping Web 2.0’s is stupid easy. You are going to need to create a footprint file for scrapebox. These are a couple that I use:
site:.tumblr.com
site:.overblog.com
site:.blogspot.com
I use the keyword scraper to create a long list of relevant keywords in the niche I am working in. I append A-Z, put the level to 2, and select all the sources. This usually provides a big list of semi relevant keywords. Send your keywords back to the harvester and click on the M button to select your footprint file you just created. This will merge all your keywords with the footprints. Let scrapebox scrape until you hit a million results or until it finishes all the keywords. Once it’s finished harvesting remove duplicate domains and then trim to root. Hopefully you will have a list of 150,000 to 250,000 domains.

Two methods to find expired blogspot and tumblr sites:
I said I wasn’t going to give my footprints away but here you go!

Vanity Check the Web 2.0’s

Open up the vanity name checker addon in scrapebox and import your harvested URL’s. I use 20 semi dedicated private proxies for checking the names. I set the connections to 40 or 50 depending how much load my proxies are under. The vanity name checker seems to crash for me in scrapebox version 1 when I import more than 250000 urls. Scrapebox 2 (64 bit version) seems to work fine with large lists of urls. After the first run is completed I remove all the taken names and rerun the name checker again in case any of the failed ones can be rechecked. You may need to do this several times to ensure you check as many of the names as possible. Hopefully you will get around 2000 or 3000 available names, but at minimum you should get 200 or 300. Save the available names to a text file so you can process it.

Find the gold

Import the file of available names into scrapebox to check the PR. I take all PR 2 sites and above and analyze them through seo moz (Page Authority addon). At this point any sites that meet my basic criteria I will go ahead and get them before someone else does. You may want to be more thorough and inspect the backlinks or whatever else, but since it takes very little effort I just register them ASAP. You would be surprised how many times a name will be taken after you research it. So now we have grabbed the sites with PR, but what about the all the rest of the sites? Don’t toss them; there are a lot of them that will have authority. They just might not have PR at the moment for various reasons. Run all of the sites through the page authority addon. I check the PA for anything above thirty and save those sites for further review. Having a minimum PA of 30 is what I use for filtering the sites for further checking; you may have different metrics you might want to use. Don’t get excited when you see the domain has a DA above 90, as obviously these sites are subdomains of high PR sites. You can manually check the links in open site explorer and look for high PR backlinks. This is about all I do to check and see if a site is worth registering. Because we do not need to pay for a domain name or hosting for these sites there is not much to lose by registering it. However if you buy or write a lot of content for the site and find out it has been spammed to death you could lose some time/money.

What Do I Do Now

So, you have some high PR web 2.0’s, but what should you do with them now? There are a few options at this point:

  1. Create or buy original content. This will always be the best solution.
  2. You could try to get the sites old content off of archive.org. This could lead to copyright violations or various other headaches. A copyright violation more than likely will get the site taken away from you. It’s also a big pain to get all the old content and get it up on the new site. I almost never use this method.
  3. You can place some PLR articles or spun content on the site. I manually spin articles for other projects and will use my manually spun articles after I check them for grammar mistakes. I always make sure they are perfectly readable.
  4. Use wicked article creator (Wac) to generate readable articles for web 2.0’s. This is what I mainly do and it works quite well. You will need to spend a few minutes correcting grammar errors on the final article.

 

Final Thoughts

You should now be able to find high PR web 2.0’s easily, but you should try improving it for your needs. I found several methods on how to find expired web 2.0’s and modified it to get better results. You should always be trying to one-up any methods you find.

Add a Comment

Your email address will not be published. Required fields are marked *