Using scrapebox to find expired domains
Scraping expired domains requires a small budget but can also be done for free other than the price of scrapebox. Scrapebox is required for this because we have the option to use the custom harvester and can modify google’s query string to look in the past. You could also do this by going to google and manually getting all the results, but this is highly inefficient. If you use some other software that can edit the search query string, then that should also work. I was not able to figure out a way to change it with the free version of GScraper. I highly recommend using some semi private proxies to speed up the scraping process, but you can read Scraping on a Budget to get a bunch of free proxies to scrape with. I use 20 semi dedicated proxies from BuyProxies and set my number of connections to 3 or 4.If you use the public proxies you can set the connections between 50 and 250 depending on how many google passed ones you have.
Time to modify the custom harvester
Go into scrapebox’s settings and make sure you have the custom harvester selected for use. Then click on the custom harvester’s settings.
Once inside the settings you will need to updated the end of the query string setting with &tbs=cdr:1,cd_min:2002,cd_max:2008 This sets the min and max dates to search. 2002 being the beginning and 2008 being the ending date. You can modify the begin and end year to help get better results. I sometimes set the end year to 2012 depending on what footprints I am using.
Scraping by hand
You will need to go to google and set a custom date range for your expired domain searching. Click on search tools and then any time. You will see a custom range tab and this will let you set your date range.
I also like to set google to display 100 search results. Then open a new tab or whatever and head over to googles webstore and install check my links. This installs a plugin for chrome that will allow you to check any links that are on your browser. I think it may also be available for firefox, but you can find another tool similar if need be. Now you’re all setup to find some expired domains manually.
What about the footprints
Unfortunately you are going to have to find your own footprints for scrapingL.Think about the date range you are scraping, what platforms were available during that time. You can start out by doing searches for blogroll, links, resources, friends, and blog links. You can search the sites that come back and see if you can find any footprints. You can click the check my links button on your browser and it will check all the links on the google page to see if any of them give 404 errors (this is what we want). Or you can use scrapebox’s alive check addon to see if any of your scraped results give a 404. However, finding expired domains like this will be rare. What we really want to do is find the sites “blogroll” and check that page for 404’s.
Process those 404’s
Scrapebox’s link extractor can find all the external links on the “blogroll” type of pages. Manually going to the “blogroll” page and using check my links will show you all of those beautiful 404 errors. You need to make a list of all the sites that give 404 errors. Once you have built up a good list you need to check if any of the domains are available for registration. I like using Monikers bulk domain search. It lets you search up to 1000 domains in a single run. You are going to want to save all the domain names that are available to a separate file. You are going to need to check if the domains you have are worth registering. You don’t want to register a domain someone has spammed to death and got penalized or deindexed. You should check the whois pa/da and manually look at some of the backlinks still pointing to the domain. I am not going to go into too much detail about this part, but here are a few pointers.
- Make sure the domain hasn’t been registered more than 3 or 4 times in the past 6 or 7 years. Domains changing hands multiple times can be a sign something is wrong with it.
- Use SeoMoz to check the domains PA and DA. Scrapebox can check this easily with the page authority addon.
- Check the backlinks using open site explorer. Tons of blog comments or other spammy links could mean the domain was penalized or deindexed at some point.
- Check the domain on Archive.org. This will tell you what was previously on the domain.
- Don’t worry about Page Rank. I do check the PR but only to prioritize the domains that I will analyze first.