<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Modern Street&#187; Search engines</title>
	<atom:link href="http://www.modernstreet.com/category/search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.modernstreet.com</link>
	<description>A Blog on and about the Web</description>
	<lastBuildDate>Wed, 25 Aug 2010 13:50:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>What do you think of Bing?</title>
		<link>http://www.modernstreet.com/search-engines/what-do-you-think-of-bing/</link>
		<comments>http://www.modernstreet.com/search-engines/what-do-you-think-of-bing/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 08:26:12 +0000</pubDate>
		<dc:creator>DarrinW</dc:creator>
				<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Bing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[popularity]]></category>
		<category><![CDATA[webmasters]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://www.modernstreet.com/?p=731</guid>
		<description><![CDATA[So Microsoft officially changed their Live search to Bing a few days ago. I was never one to use MSN Live to search (ever), but when I put it through the paces, I have to say, it&#8217;s not bad at all! Although the name sounds a little campy to me, what matters is performance. Unlike [...]]]></description>
			<content:encoded><![CDATA[<p>So Microsoft officially changed their Live search to <strong><a href="http://www.bing.com/">Bing</a></strong> a few days ago. I was never one to use MSN Live to search (ever), but when I put it through the paces, I <em>have</em> to say, it&#8217;s not bad at all!</p>
<p>Although the name sounds a little campy to me, what matters is performance. Unlike <strong><a title="Cuil" href="http://www.modernstreet.com/search-engines/first-impressions-of-cuil/">Cuil</a></strong> which has already been largely disregarded as a lame search engine, Bing looks and behaves amazingly similar to Google, except with a little more bling added (pardon the pun).<span id="more-731"></span></p>
<p>Bing has a better image and video search, which kinda reminds me of the old Google <strong><a title="Searchmash" href="http://www.modernstreet.com/search-engines/searchmash/">Searchmash</a></strong>. If you look in the Extras section on the top right corner of Bing, there are a few extra features that you might not have noticed, which varies <strong>depending on where you specify yourself to be from</strong>. To change your geo-location, just click on the country at the top right hand and you will be presented with a whole list of countries and language options.</p>
<p><img class="alignnone size-full wp-image-734" title="bing" src="http://www.modernstreet.com/wp-content/uploads/2009/06/bing.jpg" alt="bing" width="400" height="246" /></p>
<p>The most number of features is currently <strong>limited to US based surfers</strong>, in which there is a Cashback savings option, meaning you can earn back money by buying from those participating merchant&#8217;s whose advertising fees are &#8220;passed back&#8221; down to you. I do not know how Microsoft intends to manage this whole cashback thing efficiently to properly credit everyone (should it really get popular), but it looks like there was an old Cashback option at the old Live.com (which obviously never took off).</p>
<p>And then there is a &#8220;Webmaster Center&#8221; which only seems to apply for US folks as well. I submitted a site of mine to Bing using their &#8220;Webmaster Tools&#8221; by uploading an XML file given by Bing. Very similar to Google&#8217;s Webmaster Tools except much more basic of course. There is also a (basic) Blog option which isn&#8217;t available for all countries. But like Yahoo 360° which is closing down shortly, I don&#8217;t see this going anywhere.</p>
<p><img class="alignnone size-full wp-image-735" style="border: 1px solid black;" title="bing-webmaster-tools" src="http://www.modernstreet.com/wp-content/uploads/2009/06/bing-webmaster-tools.jpg" alt="bing-webmaster-tools" width="400" height="193" /></p>
<p><img class="alignnone size-full wp-image-736" style="border: 1px solid black;" title="bing-blog-community" src="http://www.modernstreet.com/wp-content/uploads/2009/06/bing-blog-community.jpg" alt="bing-blog-community" width="400" height="242" /></p>
<p>Notwithstanding all these quibbles, Bing is performing well, and have these in its favor:</p>
<ul class="unIndentedList">
<li> its search results are strikingly similar to Google&#8217;s (I wonder why?!)</li>
<li> auto defaulted to become the default search engine on IE 6 and 7</li>
<li> curiosity factor</li>
<li> good video and image search</li>
<li> the text search look and feel (and sponsored links) are nothing short of a Google clone</li>
<li> mousing over the results gives additional info about the site or page</li>
<li>nice wallpaper background which change every day and come with &#8220;hidden&#8221; Easter eggs</li>
</ul>
<p><img class="alignnone size-full wp-image-737" title="bing-background-image" src="http://www.modernstreet.com/wp-content/uploads/2009/06/bing-background-image.jpg" alt="bing-background-image" width="400" height="247" /></p>
<p>The <a href="http://blogs.zdnet.com/BTL/?p=19189">reactions from Yahoo</a> and <a href="http://www.techcrunch.com/2009/06/05/did-bing-just-leapfrog-yahoo-search/">the public</a> are expected. I do not think Yahoo is going to be overtly concerned with Bing, because of its strong position in social media. But if <a href="http://www.phoneplusmag.com/hotnews/china-blocks-twitter-tiananmen-anniversary.html">China&#8217;s ban on Bing</a> is any gauge of its efficiency, I think this &#8220;new upstart&#8221; should pat itself on the back.</p>
<p>While still very premature to think that Bing will become a strong contender to Google, but judging from its results, it looks like it could become a Google junior. <strong>B</strong>ut <strong>I</strong>t&#8217;s <strong>N</strong>ot <strong>G</strong>oogle, definitely.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.modernstreet.com/search-engines/what-do-you-think-of-bing/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>First impressions of Cuil</title>
		<link>http://www.modernstreet.com/search-engines/first-impressions-of-cuil/</link>
		<comments>http://www.modernstreet.com/search-engines/first-impressions-of-cuil/#comments</comments>
		<pubDate>Thu, 31 Jul 2008 20:59:35 +0000</pubDate>
		<dc:creator>DarrinW</dc:creator>
				<category><![CDATA[Search engines]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Cuil]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[indexed]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[searchmash]]></category>
		<category><![CDATA[sites]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.modernstreet.com/?p=428</guid>
		<description><![CDATA[A new search engine on the Web, called Cuil just launched recently amidst a lot of hype generated by thousands of blogs picking up on the story. The huge traffic flocking to try out the new search engine eventually led to a meltdown of the site&#8217;s servers. Even when I tried to use it a [...]]]></description>
			<content:encoded><![CDATA[<p>A new search engine on the Web, called <strong><a title="Cuil search engine" href="http://www.cuil.com">Cuil</a></strong> just launched recently amidst a lot of hype generated by thousands of blogs picking up on the story. The huge traffic flocking to try out the new search engine eventually led to a meltdown of the site&#8217;s servers. Even when I tried to use it a few days ago to search for a few terms, it crashed instead.</p>
<p>Now that things have cooled down a little, I just took a second look at Cuil, and found it lacking in many areas. The irrelevancy of the search results is the main gripe, when you consider that Cuil claims to index even more pages than Google as a new search engine. More than 120 billion web pages indexed is a lot of results indeed, but their main problem is <strong>relevancy</strong>.<span id="more-428"></span></p>
<p>Although created by ex Google engineers, Cuil is not prepared to handle the spam problem. If Google is already having enough problems with spam in its index, can a new start up search engine overcome this issue?</p>
<p>Many people are complaining that the images they place next to the results are not related to the site at all. I found it somewhat amusing to see pages from free webhosting sites comprising nothing more than a keyword spammed multiple times showing up on the front page.</p>
<p>Cuil claims that site content is the number one factor for its algorithm, but this leaves it vulnerable to keyword spamming. Also, by not placing filters for &#8220;Bad Neighborhoods&#8221; like Google, almost any kind of garbage site/page can end up on the front page.</p>
<p>OK, Cuil is a new start up, and shouldn&#8217;t be expected to be even close to any of the Big Three (Google, Yahoo, MSN) for now. I somewhat like the layout of Cuil though. But hey, have you even checked out Google&#8217;s AJAX powered <strong><a title="SearchMash" href="http://www.modernstreet.com/search-engines/searchmash/">SearchMash</a></strong>? That&#8217;s the Google engine, albeit with a different look and layout <img src='http://www.modernstreet.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.modernstreet.com/search-engines/first-impressions-of-cuil/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding bot names to exclude from your robots file.</title>
		<link>http://www.modernstreet.com/search-engines/finding-bot-names-to-exclude-from-your-robots-file/</link>
		<comments>http://www.modernstreet.com/search-engines/finding-bot-names-to-exclude-from-your-robots-file/#comments</comments>
		<pubDate>Tue, 17 Jul 2007 15:16:00 +0000</pubDate>
		<dc:creator>DarrinW</dc:creator>
				<category><![CDATA[Search engines]]></category>

		<guid isPermaLink="false">http://www.modernstreet.com/search-engines/finding-bot-names-to-exclude-from-your-robots-file/</guid>
		<description><![CDATA[You might have noticed an increase in the number of bots trawling the Web these days. Some are good, some are not. Good bots obey the robots.txt file, but unfortunately most bad bots don&#8217;t. In fact bad bots not only don&#8217;t obey the robots.txt file, they also steal information or download entire pages off your [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.modernstreet.com/wp-content/uploads/2007/08/robot.jpg" title="robot.jpg" alt="robot.jpg" align="left" hspace="5" />You might have noticed an increase in the number of bots trawling the Web these days. Some are good, some are not. Good bots obey the <a href="http://www.robotstxt.org/" title="Robots txt info" target="_blank">robots.txt</a> file, but unfortunately most bad bots don&#8217;t.</p>
<p>In fact bad bots not only don&#8217;t obey the robots.txt file, they also steal information or download entire pages off your website. Bad bots add to your bandwidth consumption and choke up your spam dustbin. Everyday, unknown bots crawl your sites and they dont identify themselves.</p>
<p>The problem is, how do we know which are good bots and which are bad bots? A wrongly used robots.txt file will hinder spidering, which is what we dont want.<span id="more-53"></span></p>
<p>Perhaps we should take a leaf out of Wikipedia, the premier content site on the Web, that we can safely assume to be the daily target of content eating bots and all other kinds of nefarious bots.</p>
<p>On <a href="http://en.wikipedia.org/robots.txt" title="Wikipedia robots text" target="_blank">Wikipedia.org/robots.txt</a> we probably have a useful list of bots to exclude from our sites. I&#8217;m not totally 100% sure on this, but I feel that we can very safely exclude some of those bots listed there, especially those branded as trouble by Wikipedia! It&#8217;s a personal choice, but we can safely assume Wikipedia to be the most hit up site on the Web by both humans AND robots. I would like to hear any comments on this.</p>
<p><strong>Since bad bots never obey the rules, why bother? </strong></p>
<p>There is also the opposite train of thought amongst webmasters which is to just leave the robots.txt as sparse as possible. You should experiment around and see if it makes a difference.</p>
<p><strong>2 things about the robots txt file &#8211; </strong></p>
<ul>
<li>Don&#8217;t think of password protecting your robots.txt file. That goes against it&#8217;s very purpose. If you password protect your robots.txt file than it will not be readable by a bot and therefore, they won&#8217;t know what pages are forbidden. Accept that the robots.txt file should be human readable.</li>
</ul>
<ul>
<li>Listing something in your robots.txt is no guarantee that it will be excluded. There are many bots that disobey the laws. The robots txt file is really quite limited in its usefulness, as I implied earlier.</li>
</ul>
<p>Google, Ebay and Wikipedia all employ the robots txt file on their respective sites so it might be quite useful to take a look at these mega sites&#8217; robots txt files for the educational value.</p>
<p>What about the bots that blatantly disobey the robots.txt file? Can we actually do something about it? Yes there is, but it does involve a fair deal of technical know-how. Here is a very good but technical <a href="http://www.rubyrobot.org/article/protect-your-web-server-from-spambots" title="Drastic measures" target="_blank">write up</a> on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.modernstreet.com/search-engines/finding-bot-names-to-exclude-from-your-robots-file/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Using robots.txt file to prevent search engine spidering</title>
		<link>http://www.modernstreet.com/search-engines/using-robots-txt/</link>
		<comments>http://www.modernstreet.com/search-engines/using-robots-txt/#comments</comments>
		<pubDate>Fri, 06 Apr 2007 11:21:07 +0000</pubDate>
		<dc:creator>DarrinW</dc:creator>
				<category><![CDATA[Search engines]]></category>

		<guid isPermaLink="false">http://www.modernstreet.com/search-engines/using-robots-txt/</guid>
		<description><![CDATA[When a search engine spider vists a site, say http://www.YourSite.com/, first of all, it checks for YourSite.com/robots.txt. If the robots.txt file exists (you actually created one) it will look for this code. User-agent: * Disallow: / Sometimes, for certain reasons such as: sales pages site rules disclaimers privacy policies private pages contact pages (prevent spamming) [...]]]></description>
			<content:encoded><![CDATA[<p>When a search engine spider vists a site, say <font color="#000000">http://www.YourSite.com/</font>, first of all, it checks for <font color="#000000">YourSite.com/robots.txt</font>. If the robots.txt file exists (you actually created one) it will look for this code.</p>
<p><font color="#800000">User-agent: *<br />
Disallow: /</font></p>
<p>Sometimes, for certain reasons such as:</p>
<ul>
<li>sales pages</li>
<li>site rules</li>
<li>disclaimers</li>
<li>privacy policies</li>
<li>private pages</li>
<li>contact pages (prevent spamming)</li>
</ul>
<p>we don&#8217;t want search engines to spider a page. <span id="more-11"></span>We certainly don&#8217;t want our &#8220;thank you&#8221; sales pages to show up in the search engines, since everyone could then come and download our stuff for free. I made a mistake of not adding the robots.txt to one of my sites and now the contact page has been spidered by Google, so maybe some spam is headed for my mailbox.</p>
<p>We add a robots.txt into the public_html directory and specify which pages or directories should <strong>NOT</strong> be indexed. Just create a notepad file, specify which pages should not spidered, save it as robots.text and upload it. So if your thank you page is located at <font color="#000000">yoursite.com/thankyou, than you can specify this code in your robots.txt file.</font></p>
<p><font color="#800000">User-agent: *<br />
Disallow: /thankyou</font></p>
<p>For particular html pages, if you just want that page to be NOT indexable, the easiest way to stop search engine spidering is to add this piece of meta tag code into the first text block of a page, or somewhere within the &lt;head&gt; tags. If you don&#8217;t want the page to be indexed, use this code:</p>
<p><font color="#800000"><span class="postbody"> &lt;META NAME=&#8221;ROBOTS&#8221; CONTENT=&#8221;NOINDEX&#8221;&gt;</span></font></p>
<p>If you only dont want the links to be parsed, then use this meta tag:</p>
<p><font color="#800000"><span class="postbody"> &lt;META NAME=&#8221;ROBOTS&#8221; CONTENT=&#8221;NOFOLLOW&#8221;&gt;</span></font></p>
<p>If you dont want indexing AND link parsing, use this meta tag:</p>
<p><span class="postbody"><font color="#800000">&lt;META NAME=&#8221;ROBOTS&#8221; CONTENT=&#8221;NOINDEX,NOFOLLOW&#8221;&gt;</font></span></p>
<p>This will prevent that <strong>particular</strong> page from being spidered in the ways you specified. Some spiders or robots do not obey the robots.txt file commands, and these are usually&#8221;bad&#8221; robots, like spam bots or content scrapers. So, having a robots.txt file helps in identifying the bad robots and blacklisting them. And here&#8217;s how to exclude particular robots/spiders in the robots.txt file from your entire site:</p>
<p><font color="#800000">User-agent: BadRobotX<br />
Disallow: /</font></p>
]]></content:encoded>
			<wfw:commentRss>http://www.modernstreet.com/search-engines/using-robots-txt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SearchMash, a cool search engine from Google.</title>
		<link>http://www.modernstreet.com/search-engines/searchmash/</link>
		<comments>http://www.modernstreet.com/search-engines/searchmash/#comments</comments>
		<pubDate>Tue, 03 Apr 2007 14:20:32 +0000</pubDate>
		<dc:creator>DarrinW</dc:creator>
				<category><![CDATA[Search engines]]></category>

		<guid isPermaLink="false">http://www.modernstreet.com/search-engines/searchmash/</guid>
		<description><![CDATA[I have been using this search engine for a while and I&#8217;ll give it the thumbs up. SearchMash is a search engine owned and operated by Google and is basically a test bed for Google to test out new interfaces and other user friendly features. True to Google style, it is very plain and appears [...]]]></description>
			<content:encoded><![CDATA[<p>I have been using this search engine for a while and I&#8217;ll give it the thumbs up. <a title="Search Mash" href="http://www.searchmash.com/">SearchMash</a> is a search engine owned and operated by Google and is basically a test bed for Google to test out new interfaces and other user friendly features. True to Google style, it is very plain and appears to be no frills at the start.</p>
<p>But there are a few nifty things about it, that would probably appeal to lazy people (I&#8217;ll probably belong somewhere there too), such as not having to place your cursor in the search box query whenever you wish to query something up. Just type anything, and your term will appear in the search box and the current search term.<span id="more-5"></span></p>
<p>Another is the &#8220;more results&#8221; at the bottom of the page, just hit the space bar and the search automatically shows more search results for you.</p>
<p><img class="alignleft" style="margin: 5px 7px; float: left;" title="SearchMash" src="http://modernstreet.com/wp-content/uploads/2007/04/searchmash.jpg" alt="SearchMash" hspace="6" width="363" height="212" align="center" /></p>
<p><img style="margin-left: 6px; margin-right: 6px;" title="SearchMash search tabs" src="http://modernstreet.com/wp-content/uploads/2007/04/searchmash_tabs.jpg" alt="SearchMash search tabs" hspace="6" align="center" /></p>
<p>SearchMash will display (for certain queries), some extra tabs to further help you to narrow down your search. These options are only available at the moment for certain queries, usually relating to health conditions, certain popular brands and celebrities. But I believe this feature will be expanded to cover more search terms as well.</p>
<p>As well as attempting to correct your query should you make a spelling mistake (just like the normal Google search), SearchMash will ask you if you wanted to search for another related keyword whenever you query a wide popular term. Example: Search &#8220;Asia&#8221; and SearchMash asks you if you were looking for &#8220;Asia map&#8221;.</p>
<p><img style="margin-left: 6px; margin-right: 6px;" title="SearchMash options" src="http://modernstreet.com/wp-content/uploads/2007/04/searchmash_options.jpg" alt="SearchMash options" hspace="6" align="center" /></p>
<p>However, what I like about SearchMash is the fact that the images that it comes up with are usually of a higher quality compared to the normal Google Search. Also, it allows for blog, video, wikipedia, image searching, and even Google Maps (for countries) all in one convenient spot, which is in the upper right sidebar, by just clicking the thumbnails. As SearchMash is new, I find there are a lot less spam site results appearing in the search results as opposed to the normal Google search, maybe due to a different, slimed down algorithm that isn&#8217;t targeted by search engine spammers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.modernstreet.com/search-engines/searchmash/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
