Advanced Search

Sometimes a simple search turns up rather strange results. In the basic search exercise we saw how quotes could be helpful in limiting the results to useful information. We now introduce some powerful operators that will help you save tremendous amounts of time when searching the Web.

The basic search exercise introduced the Yahoo! search engine. When you performed an open text search in that exercise, the engine first searched the Yahoo! subject tree, showed you Yahoo!'s results, and then passed along the search to AltaVista and showed you AltaVista's results. The Yahoo! subject tree contains all of the pages that the Yahoo! editors have had time to categorize. And while they have categorized a lot, the Yahoo! subject tree is far from complete. By contrast, AltaVista uses a robot program to search the Web and index documents by keywords. AltaVista's robots work tirelessly day and night, and so they are able to visit millions of pages on the Web and index all the keywords that they find. We present AltaVista in this exercise because it complements Yahoo! and because it is quite fast.

Pluses and Minuses

To search the Web most effectively means being able to specify what you want and what you don't want. In AltaVista, a plus sign means that you want the word to appear on a web page, and a minus sign means that you don't. So, +inflation -currency gives you all sites that have inflation but no currency. Conversely, -inflation +currency gives you all sites that have no inflation but lots on currency. Ready? Start up AltaVista by entering its address in your web browser: www.altavista.com. You should see a welcome page similar to this:

Figure 10 - AltaVista has come up with a much greater number of related web sites with an open text search, but more is not always better. If you know what you are looking for, it would be more productive to fine tune your results through a category search.

Your Turn! Enter a search string in the box and then click the Search button. Let's try our inflation and currency example. Be sure to include a space between the words, or AltaVista will think you are searching for one long word.

You Type
It Means
+inflation -currency
Find all pages that have information on inflation, but not on currency.

About how many documents were returned by the query? As you can see, with tens of millions of websites covered, AltaVista is going to find a lot of references. Some will be right on target, but most won't be useful to you. AltaVista tries to position the most relevant documents at the top of the list. So how does it determine what's relevant? Generally, AltaVista uses the following criteria to rank a document's relevance:

  • the query words or phrases are found in the first few words of the document
  • the query words or phrases are found close to one another in the document
  • the document contains more than one instance of the query word or phrase

Unfortunately, unscrupulous website designers have caught on to these conventions, and some sites are overpopulated with keywords that appear on the first page. For example, we once came across a car dealer that repeated Toyota Toyota Toyota in order to earn a higher relevance rating and thus attract surfers looking for a Toyota site. To filter out these false relevant ratings the better search engines might require, for example, that duplicate instances of keywords be separated by at least seven words.

Keep it Together; Keep it in Quotes

As we said, AltaVista places pages higher on its search results list when the search words appear close together on the webpage. Sometimes you want the search words right next to each other with no words in between. To accomplish this we put the words in quotes.

Your Turn! Let's say we are interested in pages on Canadian marketing. We'll try typing the words with and without quotes.

You Type
It Means
Canadian marketing
Find all pages that have the words Canadian and marketing

About how many documents did the query find?

Now let's try it with the quotes to force the words to appear right next to each other.

You Type
It Means
"Canadian marketing"
Find all pages that have the words "Canadian" and "marketing" where those words are next to each other.

About how many documents did the query find?

Finally, we'll limit further to just those sites that mention the Nielsen Survey.

You Type
It Means

"Canadian marketing" +"nielsen survey"

 

Find all pages that have the words "Canadian" and "marketing" where those words are next to each other. Those same sites must also have the words "nielsen" and "survey" right next to each other.

About how many documents did the query find?

You can get very specific!

When in Doubt, Use Lowercase

You may have noticed that in all of the examples above, we typed the search strings in lowercase letters, even when searching for a proper name such as Nielsen. The reason for this is that lowercase search strings match both lower and uppercase, but uppercase search strings match uppercase results only. So, if some Webmaster forgets to capitalize Nielsen, you'll still find that site.

A Star for the Wildcard

What if you're looking for information on European free trade zones? It's reasonable to assume that Europe free trade zones might also produce good results. Rather than run two searches—European free trade zones and Europe free trade zones—we can use the wildcard notation to match all words that start with Europe (that is, both Europe and European) by typing europe*.

Your Turn! Try the following example:

You Type
It Means

europe* +

"free trade zones"

Find all pages that begin with europe. Those same pages must have the words "free trade zones" right next to each other.

About how many documents did the query find?

Who is Linked to Us?

On the Web, more is better when it comes to site visitors—more visitors means more potential business. Visitors can link to a site in many ways:

  1. by deducing or guessing a site name (for example, www.coca-cola.com)
  2. by searching for the site using a search engine
  3. by copying the address from an advertisement or other media
  4. by following a "hot link" from another site

Coopers and Lybrand Consulting found that 39% of Web users learn about sites through other media, 44% through word of mouth, 32% through browsing, and 10% through "hot links" (source: www.cyberatlasinternet.com).

We now focus on following a hot link from one site to another and begin with a fun exercise to determine which sites on the Web have a link to our site. The sincerest form of flattery on the Web is to put a free link on a homepage to another site. The more folks that do it, the more traffic is directed to the hot-linked site. What if we could find all the sites in the world that have a link to us? Well, we can.

Your Turn! Let's see which sites link to the Air Canada homepage. The address for Air Canada is www.aircanada.ca. For the first query we want to find all sites that link to Air Canada, ordered by the number of times that they mention Air Canada on their homepage.

You Type
It Means

link:www.aircanada.ca

Find all pages with a link to Air Canada

How many sites link to Air Canada?

Go to the first site on the list. Can you find the link to Air Canada? Where was it? Interestingly, some search engines, but not AltaVista, actually use the number of links to a site to help compute its relevancy ranking. Not a bad idea since the Web community is effectively voting with its links.

Nevertheless, links from other sites are not always welcome. Ticketmaster sued Microsoft for providing a link to its site. The gripe? The Microsoft link did not point to the Ticketmaster homepage but rather to a subpage. Ticketmaster wants to communicate with users through the "front door," perhaps so that they will see a full menu of products, promotions, and advertising.



`