Top SEM and SEO Tips    

Search Results

The Invisible Web - Is My Site Getting Indexed?

Wednesday, September 14th, 2005

Did you know that the great majority of web pages are not indexed and visible on the web? It is estimated that there are an enormous amount of pages that are never indexed. One estimate states that only 0.03 percent of all web pages have been indexed by search engines!

Is My Site Getting Indexed?

Getting your site indexed by the search engines is important. But it can be challenging! They key is to make sure your whole site is designed with the search engine spiders in mind. So if you are planning a new site make sure your web designer is clued up on designing for users and also for search engines. Unfortunately it is the case that many websites have been designed for the end user but not for search engines. The unfortunate result is that there are often indexing problems on sites.

How do you check whether your site has been indexed? Within Google use the site: operator as a search query. The query site:siteURL shows you how many pages have been indexed. For example, site:http://www.yoursite.com will show you how many pages of your site have been indexed.

What does it mean if your site has not been indexed?

It could be a brand new site that has never been indexed by the search engines. It could also be site that has been banned by the search engines because of spamming or rule violation. Or there could an existing problem that is actually stopping the search engines from indexing your site.

If you have a brand new site the key is to get links to your site from other sites. Forget about these offers of ‘We will submit your site to xxxxx thousand search engines for $xxx.’ These offers can often do more harm than good and it is speculated that search engines may penalize your site for using them. So start building links to your site from other reputable sites. Quick ways to get links include through optimized press releases, published articles, directories, and also blog entries.

If you are banned it means either you have been persistently violating search engine rules or you have employed someone who is involved in this. It is your site and your responsibility to know what your staff or consultants are doing with your site. Make sure they are not involved in using spamming techniques as it requires a lot of effort to get re-instated. It can of course also adversely affect your business.

Can The Spiders Index Your Site?

To find out whether the search engine spiders are able to index your site take a look at your log files and look for visits by Googlebot, MSNbot and Slurp the Yahoo spider. If there are no visits from the spiders it could suggest that there are no links to your site (or the links you do have are bad) and the search engines are unaware of the existence of your site. It could also mean there is some type of spider trap in existence that is preventing spiders from crawling your site.

A spider trap usually takes the form of some sort of technical issue. This may mean that your site displays well enough in your browser but there are some issues within the coding or design of the site. These problems can be caused by pop-up windows, flash or frame sites, JavaScript navigation, ineffective redirects, visitor passwords, wrong robot controls, or problems with dynamic URLs. You may want to talk with your designer regarding these issues or consult with an SEO expert.

Sitemaps!

The key to making your site accessible to spiders involves building site maps linking to all your site pages and also ensuring that text links are in existence to aid navigation between pages. Site maps are basically pages that include descriptive text links to all the other pages on your site. They make it easy for the spider to find all the pages on your site. If your site map has over 100 links it is advisable to have a multi-page site map – make sure they are all interlink. Remember also to optimize these pages too, as the descriptive links include lots of relevant information about the nature of your business.

Bottom Line

Unless search engine spiders can find your site there is no way for human visitors to find it using search engines either! Indexing is always the key first step in organic search engine optimization. Make sure this in place and you are ready to move forward.

Do not add to the invisible web statistics!



Indexing Obstacles for Dynamic Web Sites

Tuesday, May 10th, 2005

URL Parameters, Session ID’s, Reserved Characters and Deep Nested web pages can make it harder for search engines to fully index your website. If your site uses any of the following you may find it hard to get indexed by the search engines:

Parameters

If your URL’s include parameters (end with ?a=1&b=2) then the search engines may not index these pages. This is because the spider can get caught in an infinite loop, indexing the same page hundreds of times with exactly the same content.

It used to be that no search engines would index pages with parameters. This is now much improved to how it used to be, however to ensure your site is indexed by all the search engine spiders always limit to a maximum of two parameters, but if possible use none.

Session IDs

Question - What’s worse than a search engine not indexing a URL with a session ID?
Answer - A search engine that does index a URL with a session ID

If a search engine indexed pages with session IDs the following could happen:

  1. Visitors coming from that search using the same session id share a shopping cart, exposing order and shipping history, and potentially credit card information
  2. The search engines index the same page with different session ids resulting in a duplicate content penalty
  3. The search engines ignore the page thus ensuring your pages are not indexed

Sometimes this problem can be extremely hard to see, the site may only use the URL session id when a visitor doesn’t accept cookies - no spiders accept cookies so they would always see the session ID. To check this properly turn off accepting cookies on your browser, clear your cookie cache, and then access your site.

Reserved Characters

Moving away from standard alphanumeric characters in a URL can cause issues. HTML and URL’s reserve certain characters to serve special uses. An example of this is the & character, which in URL’s is used to divide parameters. If your site uses these non-alphanumeric characters you need to encode them whenever they’re listed. If someone links to you from their own site they could mis-enter it causing it to point to the wrong URL.

# (pound sign or hash sign) are used for accessing anchors on pages, these can be kept when used for this purpose but avoid filesnames with these in.

Spaces are also a reserved character, being rendered as %20 in URLs. When representing a space in the URL, use a hypen (’-') instead.

Nested Too Deep

While not really about the syntax for URLs, if your page is more than three levels deep on your site the search engines may deem it irrelevant and not index it.

Depth should be measured from the home page of your site, count the number of clicks it takes to get t the destination page, add one (so you include home) and that’s your level. Keep at most three deep, if you can’t consider adding a search engine friendly site map.

Links

If you’re still having problems getting your site indexed it might just be lack of incoming links. Arrange non-reciprocated and reciprocated links from good quality sites. Instead of arranging for all your links to go to your home page, try to arrange deep links to your internal pages - some of the directories allow deep linking to internal pages as well as listing your home page.



Has Google Reached It’s Limit?

Monday, September 6th, 2004

By Dylan Downhill

Google went public last month but that’s not the big news for Google. Nor is the settlement of the PPC patent fight with Overture. The big news is that Google may have hit the limit on the number of pages it can store. In a nutshell the number of pages Google says it has indexed (currently it reads ‘Searching 4,285,199,774 web pages’ is not that far from the largest number an integer in Unix//Linux can handle - around 10 million off - see full article here http://www.w3reports.com/index.php?itemid=549. Whether they really can’t add new pages without deleting old ones or whether this is bologna (after all, they are a bunch of uber-techies - surely they increased the size of the index when they saw this coming); only Google knows.

Why did I mention this. Well we’ve been noticing a definite lag time for new sites to get fully indexed. Older sites are still being indexed fine, and we’ve noticed that adding a new page with a link from the home page will get it indexed extremely quickly (although the Page Rank takes a long time to catch up). We have noticed with older sites that rename the URL of hundreds of pages all at once (such as during a redesign) are not being indexed quickly either.

We have noticed that Yahoo search results have been getting better, and with the lag that Google has introduced we find ourselves heading to Yahoo more often than we used to, and with the toning down of the advertising the overall experience at Yahoo is more pleasing than it was (take note Ask Jeeves!)

In terms of results we have noticed that Ask Jeeves/Teoma will show ranking improvements quickly, then Yahoo follows soon afterwards. Google is taking a long time to respond to site changes.

To ensure your site gets fully indexed we do recommend that you add a good sitemap linking to all the pages you want googlebot (the Google spider) to find. In fact the process of building a sitemap can help you find orphaned pages (no longer linked to from anywhere on the site), I added one recently to Elixir and found 2 pages that were orphaned.



Publishing Through A RSS Feed - Quick Guide

Tuesday, August 17th, 2004

This is a quick guide to publishing your web content through an RSS feed. It is not meant to be extensive, it is meant to get your feet in the door of publishing content using RSS in the quickest time possible.

RSS File Format

A RSS feed is simply an XML file containing information on pages within your site. The RSS file format is as follows:

<?xml version=”1.0″ ?>
<rss version=”2.0″>
<channel>
<title>Title Text </title>
<link>Link to site’s home page </link>
<description>Description of the feed</description>

<item>
<title>Page Title</title>
<description>Page Description</description>
<link>Page Link</link>
<author>Email to Contact You On</author>
<pubDate>Published Date</pubDate>
</item>

</channel>
</rss>

Where the contents from <item> to </item> are repeated for all the content you want to publish through the RSS file. An example feed would look like:

<?xml version=”1.0″ ?>
<rss version=”2.0″>
<channel>
<title>Search Engine Optimization Tips</title>
<link>http://www.elixirsystems.com</link>
<description>
Search engine optimization tips from Elixir Systems. Helpful advice covering
all aspects of the search engine optimization process.
</description>

<item>
<title>Monitor and Tweak Your Way to Great Search Engine Rankings - Part 1</title>
<description>
The final stage of the search engine optimization loop, monitoring your site’s rankings and
tweaking the site as necessary to bring up any keywords that are not ranked well.
</description>
<link>http://www.elixirsystems.com/articles/a040709.php</link>
<author>customerservice@elixirsystems.com</author>
<pubDate>24 Jul 04 15:00:00 +0700</pubDate>
</item>

</channel>
</rss>

Information on RSS date format (external link).

RSS Validation

Now you have your RSS file set up and uploaded to your website you’ll want to run it through a RSS Validator - this ensures the contents meets the RSS specification and ensures anyone who wants to use your content can. To find suitable tools type ‘rss validator’ into your favorite search engine, the one I use and prefer due to it’s helpful error messages is Feed Validator at http://feedvalidator.org/

If the validator comes back with the error ‘Feeds should not be served with the “text/plain” media type‘ you will need to define the .rss file extension as a RSS file. If you’re running Apache the easiest way to do this is to create a .htaccess file in your website’s root directory and add the following lines:

addtype application/rdf+xml rdf
addtype application/rss+xml rss
addtype application/atom+xml atom
addtype application/xml xml

These lines will take care of your current and future RSS/XML/atom needs. Microsoft IIS this same task can be completed using the IIS MMC.

Provide RSS Links

Provide the feed as both a hot link and as a URL that can be cut and pasted into the visitors favorite RSS reader. Example RSS links can be found here.

RSS feeds should also be declared in the <head> section of the HTML for the pages on your site. Example code might look like:

<link rel=”alternate” type=”application/rss+xml” title=”Newsletter” href=”http://www.elixirsystems.com/newsletter/archive/newsletter.rss” />

<link rel=”alternate” type=”application/rss+xml” title=”Press Releases” href=”http://www.elixirsystems.com/press_releases/press_releases.rss” />

This will then cause the latest browsers (IE7, Mozilla 2) to offer the visitor the option to sign up for one of your feeds.

Get Your Content Indexed and Listed

One of the easiest ways to get your content known by the search engines is to load it into the MyYahoo part of the Yahoo! search engine.

  • Login to http://my.yahoo.com using your Yahoo! login
  • Click the ‘Add/Delete Pages’ button
  • Click the ‘Make my own..’ option (towards end of the page)
  • Tick ‘RSS Headlines (BETA)’ and click ‘Finished’ at the bottom of the page.
  • The next page will ask for your RSS feed URL - cut and paste it in and press ‘Add’

Use Your Feed

You’ve gone to all the trouble of setting up an RSS feed, you might as well use it. On this site we use it for the index pages for the documents, and also to provide cross reference information (see the SEO Tips index page where the top half is local content, the bottom half is useful information in the articles directory). The feed is also used in the sitemap to ensure it always stays up to date with the changing content on the site.

Further Reading

Once you have your RSS data feed up and running you might want to add to it. This document describes the RSS 2.0 specification in an easy to read format.