Archive for the 'Search engines' Category

Finding bot names to exclude from your robots file.

You might have noticed an increase in the number of bots trawling the Web these days. Some are good, some are not. Good bots obey the robots.txt file, but unfortunately most bad bots don’t.
In fact bad bots not only don’t obey the robots.txt file, they also steal information or download entire pages off your website. [...]

Using robots.txt file to prevent search engine spidering

When a search engine spider vists a site, say http://www.YourSite.com/, first of all, it checks for YourSite.com/robots.txt. If the robots.txt file exists (you actually created one) it will look for this code.
User-agent: *
Disallow: /
Sometimes, for certain reasons such as:

sales pages
site rules
disclaimers
privacy policies
private pages
contact pages (prevent spamming)

we don’t want search engines to spider a page.

SearchMash, a cool search engine from Google.

I have been using this search engine for a while and I’ll give it the thumbs up. SearchMash is a search engine owned and operated by Google and is basically a test bed for Google to test out new interfaces and other user friendly features. True to Google style, it is very plain and appears [...]