Scraping on a Budget

You need a scraper!

The one thing you are going to need is a scraper to help you speed things up. I highly recommend scrapebox if you currently do not own it. Since my blog is mainly about Internet marketing on a budget, you can also use the free version of GScraper. The free version of GScraper is pretty awesome [at least for the price ;)]. It will also remove duplicates as you go, while scrapebox version 1.x cannot (version 2 can). The good thing about scrapebox is that it has a built in proxy scraper. Trying to scrape google without proxies is just going to land you in ban land.

The easy way to get setup for scraping:

Open up scrapebox and scrape some public proxies. Once you have some google passed proxies take 5 or 10 of them and throw them into keyword section of the harvester. Set the time to the past 24 hours and scrape. This should give you a small list of urls. Remove duplicate urls (not domains) and then export the list as a normal text file. Now open up the proxy harvester and click on Harvest Proxies. Click on add source and import a list of urls. Obviously, you are going to import the list of urls that you just harvested. Now hit start and wait until it is finished. Once it’s finished, remove any sources that have 0 found results. I usually repeat this step every couples of days when scraping with public proxies. When I check these proxies I will put the connections to max, and I will only check for anonymous to start with. Chances are you will have a huge list of proxies to check and only checking for anonymous will save a great deal of time. Once you have a list of verified anonymous proxies save them to a file. Keeping a master list of anonymous proxies will pay off in the future. Most people churn and burn proxies and they end up being good days or weeks down the road. Check the proxies against google now. I actually retest the failed proxies a second against google after the first run. This almost always gives me more G passed proxies. Now you are setup and ready to scrape. Depending on how many proxies you have, you should be able to set the maximum connections for scraping to around 250.

The hard but free way to scraping:

You are going to need the free version of GScraper. You can download it here: http://blog.gscraper.com/index.php/2013/01/29/gscraper-basicfree-version-updated-now-support-windows-8/

You are also going to need some free proxies.

Gather proxy is very easy to use but I think it is limited to 2000 proxies. ProxyFire is free and works better than any other proxy scraper. However, proxyfire is fairly complicated to get setup and running. Unfortunately, setting up proxyfire would require a huge tutorial and I do not have the time right now. You can find tutorials for proxyfire here: http://www.proxyfire.net/forum/showthread.php?t=3374
Once you get your proxies your pretty much ready to go. You will need to fire up GScraper and put your proxies in the proxy section. Don’t forget to set any options you may need. I always set it to remove duplicate urls while scraping. This is the one feature I really like (although it appears to be in the scrapebox version 2.x). You will need to figure out how many connections to use based on your proxies, PC, and/or internet speeds.

The key to scraping is proxies:

I personally use semi dedicated private proxies from BuyProxies  for scraping. I run very few threads when scraping with these proxies but get much better results than public proxies. Since we are focusing on having a small budget or no budget here are some programs that you can scrape public proxies with.

http://www.gatherproxy.com/gptool
The free version is limited but you can get around 2000 proxies with it.
http://www.proxyfire.net/
This is complicated to use but it beats every proxy scraper out there.
http://www.project2025.com/charon.php
This is more of a proxy checker, but works better than any other proxy checker.
http://www40.zippyshare.com/v/69499165/file.html
This will let you import a list of urls and scrape the sites for proxies.

The First Post!

I have contemplated building a blog like this for quite a while. I have been creating sites and online businesses for a long time now. Most of the things I have learned have been from doing things on my own. I have found that forums are full of misinformation and the majority of “helpful” people only want your money. It also seems that if one person with a high post count says something everyone else will repeat the same thing even if it’s not correct. I have found this to be mostly true at forums like warrior forum and other places like digital point. This even rings true for BHW, although it’s not as noticeable as the mainstream forums. I’ll even admit to doing the same thing if I thought it would lead to a sale through my signature link. There are very few places to get real information that can help you create a full time career as an internet marketer. The number one thing to remember is that everyone wants to make a sale off of you one way or another. Even the places I have found reliable info at still want to make a buck from you.

Note: Stay away from the warrior forum’s WSO section. It is very possible to find some great items in there, but 90% are rehashed crap you can find anywhere on the interwebs. The few things I purchase from the WSO’s forum are software related items, and that’s because it is usually cheaper in the WSO forum than anywhere else. I have seen so many beginners fall into a habit of buying every new WSO that comes out and never doing anything.

I am going to dispel as much bad information as I can and also point you in the right direction to really help you along your IM career. Since this is my first post and I don’t have anything of value on my blog yet, I am going to point you to a couple of places that offer real information.

Matthew Woodward, I truly respect this guy and his blog. Keep in mind the purpose of his blog is to make affiliate commissions, but the quality of the content he provides is one of the best. He has honest reviews of software tools that you will need at some point. There are some things I do not agree with him about, but he is mostly right on. Most of his stuff/reviews revolve around “black hat” tools. Once you’ve been in this industry long enough you will realize there is no color to the hats people wear. It’s all about making money! I will say this though; don’t go crazy buying everything he recommends. There are a lot of tools you will not need unless you are a full time SEO. However, GSA SER comes highly recommended by me. Do not, I repeat do NOT buy Ultimate Demon; GSA SER runs circles around it for a fraction of the price. Do check out his tutorials on blogging and tiered linking, they are some of the best ones out right now.

Jacob King, has some great content on his blog, but it is getting a little out dated. The scrapebox stuff he has is probably all you will ever need. I’m pretty sure it has been rehashed into a guide and sold for money on various places. His other articles are definitely worth a read but just keep an eye on the dates.

There are quite a few more places I like, but I will create another post later on where you can find relevant and solid information about internet marketing.