Search Engine Crawlers, or Spiders, are small automated software programs that are sometimes also referred to as “robots” These programs will frequently roam around the “world wide web” searching for new, updated, or changed web pages and helps search engines index or “catalog” every web site correctly to produce optimal search results for search engine users. Some websites are “crawled” daily, while others are not.
When a search engine spider arrives at your web page, it first looks for a robots.txt file. This is an transparent HTML code file used to tell spiders/robots which areas of your site are off-limits and shouldn’t be cataloged. Some examples would be pages that contain HTML code that are a waste of time (such as Flash pages). A robots.txt file will re-direct spiders away from these types of pages.
Search Engine Crawlers also collects outbound links from the page. These routes will eventually be followed to other pages. Spiders follow links from one page to another page. Frequency of visits will vary from one search engine to another, as they are their databases are different from one another.
As a web site owner, you should regularly check and see which pages of your site have been visited by the crawlers and spiders. You can do this by checking your log statistics program. Most spiders are easily identifiable by their names.
Any web site owner should know which pages the search engine robots have visited. Look at your server log reports or the results from your log statistics program. ( If you don’t have one, upgrade your web hosting service. VectorInter.Net provides these tools free, with every web site hosting contract.) Most spiders / robots are easily identifiable by their “user agent” names. Google’s robot is named “Googlebot”. Other spiders have funny names, such as “Slurp”, Inktomi robot’s name.
For helpful ranking factors and tips on how to make your website more visible to crawlers, check out http://www.thebestmedia.com/blog/google-ranking-factors-that-you-should-know/