Top SEM and SEO Tips    

Archive for the ‘Search Engine Optimization’ Category

Indexing Obstacles for Dynamic Web Sites

Tuesday, May 10th, 2005

URL Parameters, Session ID’s, Reserved Characters and Deep Nested web pages can make it harder for search engines to fully index your website. If your site uses any of the following you may find it hard to get indexed by the search engines:

Parameters

If your URL’s include parameters (end with ?a=1&b=2) then the search engines may not index these pages. This is because the spider can get caught in an infinite loop, indexing the same page hundreds of times with exactly the same content.

It used to be that no search engines would index pages with parameters. This is now much improved to how it used to be, however to ensure your site is indexed by all the search engine spiders always limit to a maximum of two parameters, but if possible use none.

Session IDs

Question - What’s worse than a search engine not indexing a URL with a session ID?
Answer - A search engine that does index a URL with a session ID

If a search engine indexed pages with session IDs the following could happen:

  1. Visitors coming from that search using the same session id share a shopping cart, exposing order and shipping history, and potentially credit card information
  2. The search engines index the same page with different session ids resulting in a duplicate content penalty
  3. The search engines ignore the page thus ensuring your pages are not indexed

Sometimes this problem can be extremely hard to see, the site may only use the URL session id when a visitor doesn’t accept cookies - no spiders accept cookies so they would always see the session ID. To check this properly turn off accepting cookies on your browser, clear your cookie cache, and then access your site.

Reserved Characters

Moving away from standard alphanumeric characters in a URL can cause issues. HTML and URL’s reserve certain characters to serve special uses. An example of this is the & character, which in URL’s is used to divide parameters. If your site uses these non-alphanumeric characters you need to encode them whenever they’re listed. If someone links to you from their own site they could mis-enter it causing it to point to the wrong URL.

# (pound sign or hash sign) are used for accessing anchors on pages, these can be kept when used for this purpose but avoid filesnames with these in.

Spaces are also a reserved character, being rendered as %20 in URLs. When representing a space in the URL, use a hypen (’-') instead.

Nested Too Deep

While not really about the syntax for URLs, if your page is more than three levels deep on your site the search engines may deem it irrelevant and not index it.

Depth should be measured from the home page of your site, count the number of clicks it takes to get t the destination page, add one (so you include home) and that’s your level. Keep at most three deep, if you can’t consider adding a search engine friendly site map.

Links

If you’re still having problems getting your site indexed it might just be lack of incoming links. Arrange non-reciprocated and reciprocated links from good quality sites. Instead of arranging for all your links to go to your home page, try to arrange deep links to your internal pages - some of the directories allow deep linking to internal pages as well as listing your home page.



Search Engine Friendly Redirects - File Level

Tuesday, March 15th, 2005

There are three articles dealing with redirects to handle redirecing one file at a time, redirecting one directory at a time, and redirecting multiple pages easily.

If you are going to move a page you will likely want to redirect visitors to the old page to the new in such a methd that the search engines don’t get confused. Some of the ways they can get confused include:

  • Bringing up two copies of the same page. This is likely to trip a duplicate content penalty.
  • Using a temporary redirect. This means ‘the page has moved but will be back shortly - don’t update your index’.

A 301 permanent redirect is the redirection method recommended by the major search engines. Using a 301 redirect you are in effect telling the search engines the page has moved and to update their index. It also has the nice side benefit of redirecting the benefit of inbound links to the new page.

Implementing a 301 permanent redirect is different depending on the operating system and/or programming language you are using on your server:

IIS Redirect

  • In internet services manager, right click on /old-file.htm
  • Select the radio titled “a redirection to a URL”.
  • Enter the redirection page.
  • Check “The exact url entered above” and the “A permanent redirection for this resource”
  • Click on ‘Apply’

Apache Redirect

Create a file called .htaccess in your root directory and add the following line:

Redirect 301 /old-file.htm http://www.mywebsite.com/new-file.htm

ColdFusion Redirect

Edit the file /old-file.htm and put the following code:

<cfheader statuscode=”301″ statustext=”Moved permanently”>
<cfheader name=”Location” value=”http://www.mywebsite.com/new-file.htm”>

PHP Redirect

Edit the file /old-file.htm and put the following code:

<?php
Header( “HTTP/1.1 301 Moved Permanently” );
Header( “Location: http://www.mywebsite.com/new-file.htm” );
?>

ASP Redirect

Edit the file /old-file.htm and put the following code:

<%@ Language=VBScript %>
<%
Response.Status=”301 Moved Permanently”
Response.AddHeader “Location”, ” http://www.mywebsite.com/new-file.htm”
%>

ASP .NET Redirect

Edit the file /old-file.htm and put the following code:

<script runat=”server”>
private void Page_Load(object sender, System.EventArgs e) {
Response.Status = “301 Moved Permanently”;
Response.AddHeader(”Location”,”http://www.mywebsite.com/new-file.htm”);
}
</script>

HTML Redirect

Edit the file /old-file.htm and put the following code:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN”>
<html>
<head>
<title>Your Page Title</title>
<meta http-equiv=”REFRESH” content=”0;url=http://www.mywebsite.com/new-file.htm”>
</HEAD>
<BODY>Optional page text here.
</BODY>
</HTML> 

Related Articles

Search Engine Friendly Redirect - Directory Level
Search Engine Friendly Redirect - Page Level
Search Engine Friendly Redirect - Custom 404



Search Engine Friendly Redirects - Directory Level

Tuesday, March 15th, 2005

There are three articles dealing with redirects to handle redirecing one file at a time, redirecting one directory at a time, and redirecting multiple pages easily.

If you are going to move a page you will likely want to redirect visitors to the old page to the new in such a methd that the search engines don’t get confused. Some of the ways they can get confused include:

  • Bringing up two copies of the same page. This is likely to trip a duplicate content penalty.
  • Using a temporary redirect. This means ‘the page has moved but will be back shortly - don’t update your index’.

A 301 permanent redirect is the redirection method recommended by the major search engines. Using a 301 redirect you are in effect telling the search engines the page has moved and to update their index. It also has the nice side benefit of redirecting the benefit of inbound links to the new page.

Implementing a 301 permanent redirect is different depending on the operating system you are using on your server:

IIS Redirect

  • In internet services manager, right click on /old-directory
  • Select the radio titled “a redirection to a URL”.
  • Enter the redirection page.
  • Check “The exact url entered above” and the “A permanent redirection for this resource”
  • Click on ‘Apply’

Apache Redirect

Create a file called .htaccess in your root directory and add the following line:

Redirect 301 /old-directory/ http://www.mywebsite.com/new-directory/

Related Articles

Search Engine Friendly Redirect - Directory Level
Search Engine Friendly Redirect - Page Level
Search Engine Friendly Redirect - Custom 404



Search Engine Friendly Redirects - Custom 404s

Tuesday, March 15th, 2005

There are three articles dealing with redirects to handle redirecting one file at a time, redirecting one directory at a time, and redirecting multiple pages easily.

If you are going to move a page you will likely want to redirect visitors from the old page to the new in such a method that the search engines don’t get confused. Some of the ways they can get confused include:

  • Bringing up two copies of the same page. This is likely to trip a duplicate content penalty.
  • Using a temporary redirect. This means ‘the page has moved but will be back shortly - don’t update your index’.

A 301 permanent redirect is the redirection method recommended by the major search engines. Using a 301 redirect you are in effect telling the search engines the page has moved and to update their index. It also has the nice side benefit of redirecting the benefit of inbound links to the new page.

Detailed below is how to use a custom 404 redirect to handle moving pages. A 404 error is produced when the server can not find the file requested by a visitor. This is useful if you can’t use the normal 301 redirect methods, such as when you move CMS systems, the whole of your site’s file layout changes,etc. I have used these techniques when switching from static .htm pages to dynamic .asp pages which necessitated changing all filenames. You can also use this method to make the redirects database driven.

These 404 based redirection techniques rely on programming. When set up you will need to check that the server has overridden the 404 error code with a 301 code using a header checking tool (there are plenty available on the net).

IIS 404 Redirect

Using the IIS MMC as follows:

  • Right click on the website or directory that you want the 404 to apply to.
  • Click ‘Properties’.
  • Click on ‘Custom Errors’
  • Scroll down to 404 and highlight. Click the ‘Edit’ button.
  • In the drop down, select URL (this is important - doesn’t work otherwise). Then enter a URL on your site to use for the programming.
  • Click ‘OK’ to save this change.

On the custom 404 page itself you can use vbscript or any other programming language to read the server variables to decide what page to display, or where to redirect the user.

Apache 404 Redirect

Create a file called .htaccess in your root directory and add the following line:

ErrorDocument 404 /errors/404.php

On the custom 404 page itself you can use PHP or any other programming language to read the server variables to decide what page to display, or where to redirect the user.

Related Articles

Search Engine Friendly Redirect - Directory Level
Search Engine Friendly Redirect - Page Level
Search Engine Friendly Redirect - Custom 404



Emulate Crontab Using ColdFusion

Monday, March 7th, 2005

By: Nathan Johnson

Setting up automated scripts on Windows can be difficult. The built in scheduler is hard to set especially if you don’t have console access. An easier way is to set up the site so that the first person to navigate to the site each morning causes the script to run. This is actually really simple to do, and can be completely seamless for the end user. Simply use the script page as the source of a little 1px by 1px image hidden somewhere at the bottom of your site’s footer:

	<img src="http://www.mysite.com/dbupdate.cfm" width="1" height="1" border="0">

Even though the SRC of this “image” is not a real image file, the server doesn’t know that, so it still runs the page to accommodate the request, and doesn’t require your user to navigate through the page or require you to dump large blocks of code into each of your site’s pages. In essence, this works almost like an include file, but doesn’t require the server to parse and run the template prior to loading the page. It’s also advantageous in that the script won’t die or stop if the user navigates away from the page before it’s completed.

There are a couple of problems that are now presented - first, how to ensure that the first person to navigate the site actually triggers the script? By simply putting this image tag at the bottom of each page in the entire site (whether through an include file or by hard coding it), any page on the site will trigger the script. This could cause another problem though, as we now can’t prevent the user from navigating to another page in the site and causing the script to trigger multiple times. It is important that we build the logic to accommodate these requests quickly (to save server resources), and make sure that multiple downloads don’t occur. Two quick IF/THEN statements will help us. First, let’s discuss the multiple download issue. By creating a file on the server that acts as an alarm that a download is already in progress, we can avoid multiple downloads. I’ll show you how to create the file later in the script, but for now here’s how I am checking for its existence:

	<!--- PATH INFO FOR LOCK FILE --->
	<CFSET MLSLockFile = "C:\Inetpub\wwwroot\mysite.com\database_dir\crontab.lck">

	<!--- CHECK FOR EXISTING LOCK FILE --->
	<CFIF FileExists(MLSLockFile)>

	<!--- IF LOCK FILE EXISTS, REPORT THAT THE DOWNLOAD IS IN PROGRESS --->
	<h3>UPDATE IS ALREADY IN PROGRESS!</h3>

This works in the same way that Macromedia’s DreamWeaver MX check out principle works, in that if this file is present it triggers an alert. In this case, the alert is caught by the IF/THEN statement and the rest of the script is avoided. Since this only takes a millisecond and doesn’t hog any server resources (probably less load on your server than even passing a 1×1.gif image), we can safely call this script from any page of our site - even with decent web traffic - and the download won’t be triggered more than once. Next, we need to make sure that we only do the download when it’s required. For this example, I am restricting downloads to once per day. This is done by obtaining the modified date on the text file we previously FTP downloaded. It’s not much more complicated to get the modified timestamp as well, if you need to do multiple updates in a single day:

	<!--- LOCK FILE DOES NOT EXIST --->
	<cfelse>

	<!--- GET TIMESTAMP FROM LAST MLS UPDATE (CODE) --->
	<cfscript>
	function FileDateLastModified(path)
	{
	  Var fso  = CreateObject("COM", "Scripting.FileSystemObject");
	  Var theFile = fso.GetFile(path);
	  Return theFile.DateLastModified;
	}
	</cfscript>

	<!--- LAST MLS UPDATE (TRIGGER & FILE REFERENCE) --->
	<CFSET TheFile = "C:\Inetpub\wwwroot\mysite.com\database_dir\Listings.txt">

	<!--- LAST MLS UPDATE (LOGICAL OPERATOR) --->
	<cfif #DateFormat(FileDateLastModified(TheFile), 'mm/dd/yyyy')# - #DateFormat(Now())# lt 0>

The “TheFile” variable sets up the reference to the text file, and a simple IF/THEN statement checks that the difference between the modified date of the existing file and the current date is less than 0. If this is the case (indicating that an update is needed), then the lock file will be generated and the download/update script will be run:

	<!--- CREATE LOCK FILE --->
	<cffile action="write"
		   file="C:\Inetpub\wwwroot\mysite.com\database_dir\crontab.lck"
		   output="101010">

Note that the output is needed to write the file, but isn’t used for anything. Here’s where all the above mentioned FTP and database updating code goes. Once the script is complete, we will destroy the temporary lock file:

	<!--- REMOVE LOCK FILE --->
	<cffile action="delete"	file="C:\Inetpub\wwwroot\mysite.com\database_dir\crontab.lck">

.and now we’ve got a complete picture. To sum up, the script will now do the following tasks for us:

  1. Check for a lock file (preventing multiple downloads).
  2. If not, check the timestamp to see whether a download and update is needed (keep updates appropriately periodic).
  3. If an update is needed, create a lock file to prevent multiple downloads and FTP the information from a remote location to our local server.
  4. Update the database with the new information.
  5. Remove the lock file (when complete) to allow the update to run again.


Automating Database Updates Using ColdFusion

Monday, March 7th, 2005

By: Nathan Johnson

Recently, I was asked to help automate some time consuming database update tasks for one of our Real Estate clients. They needed to ensure that their MLS listings were always up to date, but also couldn’t afford to take their site down while they were manually updating the database (imagine the data entry headache brought on by 1000 new MLS listings every day!). To further complicate matters, their hosting only supports ColdFusion, and can’t parse ASP or PHP pages. Thankfully, the concepts presented here are relatively straight forward and translate to PHP and ASP easier than the other way around.

Simply put, I needed to integrate their site with a program that automatically and seamlessly runs on their Cold Fusion server by updating their database once daily. Here’s how I did it!

First, I need to FTP download the updated database, which is hosted offsite. Here’s my FTP code:

	<!--- BEGIN DOWNLOADING DATA FILE --->

	Downloading MLS Listings...
	<cfset thread = CreateObject("java", "java.lang.Thread")>
	<cfflush>
	<cfset thread.sleep(100)>

		<!--- OPEN FTP CONNECTION TO MLS LISTING FTP SERVER --->
		<cfftp action = "open"
			username = "USERNAME"
			password = "PASSWORD"
			server = "TEST.DATABASE.COM"
			connection="myFtpConnection"
			stopOnError = "Yes"
			passive="yes">

		<!--- NAVIGATE TO THE DIRECTORY THAT CONTAINS THE MLS LISTING TEXT FILE --->
		<cfftp action="changedir"
			connection="myFtpConnection"
			directory="/SOURCEFOLDER"
			passive="yes">

		<!--- FTP DOWNLOAD THE MLS LISTING TEXT FILE (Listings.txt) --->
		<cfftp action="getfile"
			connection="myFtpConnection"
			remotefile="Listings.txt"
			localfile="C:\Inetpub\wwwroot\mysite.com\database_dir\Listings.txt"
			failifexists="no"
			passive="yes">

		<!--- CLOSE THE FTP CONNECTION --->
		<cfftp action = "close"
			connection = "myFtpConnection"
			passive="yes">

	DONE!<br>
	<br>
	Updating Database...
	<cfset thread = CreateObject("java", "java.lang.Thread")>
	<cfflush>
	<cfset thread.sleep(100)>

Note the use of the following code lines throughout the page:

	<cfset thread = CreateObject("java", "java.lang.Thread")>
	<cfflush>
	<cfset thread.sleep(100)>

Since this is a program that runs on the server, the server will not automatically print the status messages out while the program is running. As such, the page appeared to be loading and frozen when in fact it is waiting for the download to complete. By inserting the above lines, the server will pause for 100 milliseconds, long enough to print out the status messages written on the lines above, but not long enough to cause a noticeable delay in the update process. Another point of interest from above is the failifexists=”no” attribute of the tag. This is important to ensure that the local file is being overwritten each day as this update is run. Now that we’ve downloaded the updated database listings to a local file, I need to input the data into the database. This site uses a Microsoft Access .MDB database file, so I first need to clear out the existing lines of data in that file to prevent any old listings from showing up after they’ve been removed from the updated MLS database:

	<!--- SQL COMMAND TO CLEAR THE DATABASE DATA, WILL BE REWRITTEN --->
	<cfquery name="qryInsert" datasource="MLS">
		  DELETE * FROM listing_table
	</cfquery>

Also note that I had already set up the datasource name of “MLS” to refer to my database file located on the server, which only involved a quick call to the web hosting company.

Now, I need to parse the new text file into usable chunks of data and input that information into the database. To accomplish this, I used a CFLOOP tag that loops through each line of the text file as it is read by the server. Also, the source text file is tab delimited and I don’t want to have to refer to tabs when parsing my data (Just because I’m picky!), so I will replace all the tabs with vert line characters (”|”):

	<!--- OPEN THE LOCAL COPY OF THE LISTINGS TEXT FILE --->
	<cffile action="read"
		file="C:\Inetpub\wwwroot\mysite.com\database_dir\Listings.txt"
		variable="txtFile">

	<!--- SET COUNT AT 0 TO SKIP FIRST LINE OF DATA FROM LISTINGS TEXT FILE (COLUMN HEADINGS) --->
	<cfset CountVar = 0>

	<!---
	SET UP INDIVIDUAL LINES OF DATA TO INPUT TO THE SQL COMMAND:
		FIX SINGLE QUOTE PROBLEMS
		REMOVE ANY VERT LINES '|'
		REPLACE ALL TABS WITH VERT LINE DELIMITERS
	--->
	<cfloop
		index="record"
		list="#Replace(Replace(Replace(txtFile,'''','''''','all'),'|','','all'),chr(9),'| ','all')#"
		delimiters="#chr(13)##chr(10)#">

Note that by using the delimiters of “#chr(13)##chr(10)#”, the text file is read one line at a time. The ASCII characters of NewLine (chr13) and carriage return (chr10) is the standard for denoting new lines in .txt files. One of the tricky things I discovered was that the text source file that holds the updated MLS data has a first line for the column headings. Obviously there wasn’t a house listing for “$L,IST,PRI.CE” (haha), so I needed to skip the first line while reading the file’s information. To get around this, I set up an independent variable that is incremented on each pass through the loop. With a simple IF/THEN set, I avoid inserting the line of invalid data into the database:

	<!--- SKIP DATA IF THE FIRST LINE (COLUMN HEADINGS) --->
	<cfif (CountVar gt 0)>

	<!--- SETUP THE VALUES FOR THE SQL COMMAND --->
	<cfif trim(listgetat(record,1,'|')) is ''>
		<cfset MLSNumber = ' '>
	<cfelse>
		<cfset MLSNumber = '#trim(listgetat(record,1,'|'))#'>
	</cfif>
	<cfif trim(listgetat(record,2,'|')) is ''>
		<cfset PropertyAddress = ' '>
	<cfelse>
		<cfset PropertyAddress = '#trim(listgetat(record,2,'|'))#'>
	</cfif>
	<cfif trim(listgetat(record,3,'|')) is ''>
		<cfset ListingPrice = '0'>
	<cfelse>
		<cfset ListingPrice = '#trim(listgetat(record,3,'|'))#'>
	</cfif>

Here, I assign variables to the various chunks of data, while I validate them against being blank or invalid entries. The ListGetAt function is a handy way to refer to various chunks of the current data array and the IF/THEN statements simply check to make sure that the data being read isn’t blank. If it is, the function returns a space character that takes care of ColdFusion’s built-in “skip the field if it’s blank” attitude - if you’re new to this, all you need to know is that ColdFusion will ignore that a field exists if the data is blank. For instance, an array that contains “1,2,3,,5″ is turned into the array “1,2,3,5″. Since the value in the 4th column is blank, the server automatically turns the next value into that column! DOH! As a side note - IMHO, space characters are the easiest NULL values to deal with, since the word NULL can’t simply be trimmed out of a query response.

Also note that required numerical fields are entered with a “0″ instead of a space, as a blank value is invalid for a numerical database field in Access. During development, it is also handy to output the information that is being built on the fly, since it’s easier to read through an output than to try and troubleshoot database error messages. Here’s one of the print lines I used during development:

	<p><b>Line Number <cfoutput>#CountVar#</cfoutput>:</b>
		<cfoutput>INSERT INTO listing_table (MLSNumber, PropertyAddress, ListingPrice)
		VALUES ('#MLSNumber#', '#PropertyAddress #', '#ListingPrice #')</cfoutput></p>

This simply prints out the variables along with a handy line number (very useful when trying to figure out whether a problem in validation is due to the source information or the page’s coding!). Now, I simply plug the info into my database and end my IF/THEN statement that ignores the column headings:

	<!--- SQL COMMAND TO INPUT THE LISTINGS DATA FROM THE TEXT FILE TO THE ACCESS DATABASE --->
	<cfquery name="qryInsert" datasource="MLS">
		  INSERT INTO listing_table (MLSNumber, PropertyAddress, ListingPrice)
		  VALUES ('#MLSNumber#', '#PropertyAddress #', '#ListingPrice #')
	</cfquery>

Also, I increment the counter so the next lines are read and entered in the DB, then looped to the next line, etc:

	<!--- INCREMENT THE COUNTER TO SKIP THE FIRST LINE (COLUMN HEADINGS) --->
	<cfset CountVar = CountVar + 1>

	</cfloop>

	DONE!<br>
	<cfset thread = CreateObject("java", "java.lang.Thread")>
	<cfflush>
	<cfset thread.sleep(100)>

That’s the upload and update program now there’s the problem of how to make the server run the script. You can either set up something server side (neither fun nor easy on Windows IIS), or you can simply make sure that the first person to navigate the site each morning causes the script to run. For more details see the ‘Emulating Crontab using Coldfusion‘ article.




Avoiding Duplicate Content Penalties

Tuesday, January 18th, 2005

Duplicate Content Penalties

By: James Peggie and Dylan Downhill
Originally Published: Jan 18, 2005
Updated: Jan 30, 2005

Fact: Google penalizes page rank when it determines that content is duplicated by other sites.

If your rankings have slipped then it’s possible that your page contents have been duplicated causing a duplicate content penalty. Google doesn’t want multiple copies of the same content cluttering their results pages so they will devalue all but one of the copies of the content based on the age of the page.

Don’t Let Other People Benefit From Your Hard Work

Writing good quality content for your site is hard work! If someone has not paid you for your work (either with money or with a reciprocal link or other agreed payment) it is stolen!

Checking For Duplicate Content

It is a chore to check for duplicated content but luckily someone has come to the rescue - www.copyscape.com. Just by entering the URL of the page you want to check, Copyscape will return a list of pages in the Google index that contain text also present on your site, or for more detail you can subscribe to their Copysentry service.

No one knows how much duplication can result in a penalty, if it’s 10 words, 20, a paragraph or a whole page. You will need to make a decision on whether you believe you have a problem. A lot of time your marketing text will appear as a description for a link to your site - this probably won’t be counted as duplicate content. If your site’s position in the search engines has recently plummeted then duplicate content might be the cause or if a competitor is found for text or an article you wrote then you may also have an issue.

What To Do With Duplicate Content?

If you are hosting someone else’s content and you’re seeing duplicate content put a ‘robots’ meta tag in the head section to stop the search engine spiders indexing that page. If you’re tempted to modify the content then you will need to get the original author’s permission. A much better option would be to take the central idea of the article and write a completely new article using your own text.

If you’re concerned that someone has duplicated your content, write to the website owner who has published your content requesting they remove the offending text, you can mention that you will report the matter to Google under their DMCA guidelines.

If the email does not elicit a response or your content is still visible then report the duplicate content issue to Google under the DMCA guidelines they provide at http://www.google.com/dmca.html

If all else fails, change your copy of the duplicated text. Keeping your copy fresh is essential so make the best of a bad situation and write even better copy.

Tips To Ensure You Avoid Duplicated Content

  • Put a copyright notice on the bottom of the page and warn that you check for duplicated content.
  • If you have multiple domains that point to the same site content take advantage of permanent redirection. (301 status report) This informs the spider of the redirection so they understand you are not putting up duplicate content.
  • When you have an article to get republished on other sites send them in a text format. (Articles are a great way to quality incoming links by the way!) This ensures that when the article is republished it will be reformatted and viewed by the search spiders as original.



Search Engine Friendly Site Map

Sunday, August 22nd, 2004

Date created: 22 August 2004

If you’re having problems getting your whole site indexed then you should add a search engine friendly site map to your site. You’ll need a site map when:

  • Your site menu is flash or pure javascript (most rollover scripts are OK if they use ‘<a href’ links).
  • Your menu structure is more than 3 levels deep.
  • Some of your content is not getting indexed and it is publicly readable i.e. not behind a fire wall, login isn’t required, etc.
  • You want to provide some ‘Spider Bait’ i.e. optimized link text.

Whatever your reason (or for no reason) a site map will not hurt your search engine position and may help.

A site map consists of anchor links (<a href>) links pointing to every page in your site. If you’re an eCommerce site this would include all product pages and all category pages; if you’re a general information site then every article would be indexed. It might be necessary to set up more than one site map if your site is large enough.

The text of the link should be the search phrase you want the target page found for. Use the product name, a key search phrase, etc.

Once your site map is built you will want to link to it from all pages using a standard anchor link (<a href>), this link is usually put at the bottom of the page as it’s not designed to be part of the main navigation. The reason you want all pages is to provide visibility to the site map from any landing page so that it’s useful to your human visitors as well as the automated variety.

 



Publishing Through A RSS Feed - Quick Guide

Tuesday, August 17th, 2004

This is a quick guide to publishing your web content through an RSS feed. It is not meant to be extensive, it is meant to get your feet in the door of publishing content using RSS in the quickest time possible.

RSS File Format

A RSS feed is simply an XML file containing information on pages within your site. The RSS file format is as follows:

<?xml version=”1.0″ ?>
<rss version=”2.0″>
<channel>
<title>Title Text </title>
<link>Link to site’s home page </link>
<description>Description of the feed</description>

<item>
<title>Page Title</title>
<description>Page Description</description>
<link>Page Link</link>
<author>Email to Contact You On</author>
<pubDate>Published Date</pubDate>
</item>

</channel>
</rss>

Where the contents from <item> to </item> are repeated for all the content you want to publish through the RSS file. An example feed would look like:

<?xml version=”1.0″ ?>
<rss version=”2.0″>
<channel>
<title>Search Engine Optimization Tips</title>
<link>http://www.elixirsystems.com</link>
<description>
Search engine optimization tips from Elixir Systems. Helpful advice covering
all aspects of the search engine optimization process.
</description>

<item>
<title>Monitor and Tweak Your Way to Great Search Engine Rankings - Part 1</title>
<description>
The final stage of the search engine optimization loop, monitoring your site’s rankings and
tweaking the site as necessary to bring up any keywords that are not ranked well.
</description>
<link>http://www.elixirsystems.com/articles/a040709.php</link>
<author>customerservice@elixirsystems.com</author>
<pubDate>24 Jul 04 15:00:00 +0700</pubDate>
</item>

</channel>
</rss>

Information on RSS date format (external link).

RSS Validation

Now you have your RSS file set up and uploaded to your website you’ll want to run it through a RSS Validator - this ensures the contents meets the RSS specification and ensures anyone who wants to use your content can. To find suitable tools type ‘rss validator’ into your favorite search engine, the one I use and prefer due to it’s helpful error messages is Feed Validator at http://feedvalidator.org/

If the validator comes back with the error ‘Feeds should not be served with the “text/plain” media type‘ you will need to define the .rss file extension as a RSS file. If you’re running Apache the easiest way to do this is to create a .htaccess file in your website’s root directory and add the following lines:

addtype application/rdf+xml rdf
addtype application/rss+xml rss
addtype application/atom+xml atom
addtype application/xml xml

These lines will take care of your current and future RSS/XML/atom needs. Microsoft IIS this same task can be completed using the IIS MMC.

Provide RSS Links

Provide the feed as both a hot link and as a URL that can be cut and pasted into the visitors favorite RSS reader. Example RSS links can be found here.

RSS feeds should also be declared in the <head> section of the HTML for the pages on your site. Example code might look like:

<link rel=”alternate” type=”application/rss+xml” title=”Newsletter” href=”http://www.elixirsystems.com/newsletter/archive/newsletter.rss” />

<link rel=”alternate” type=”application/rss+xml” title=”Press Releases” href=”http://www.elixirsystems.com/press_releases/press_releases.rss” />

This will then cause the latest browsers (IE7, Mozilla 2) to offer the visitor the option to sign up for one of your feeds.

Get Your Content Indexed and Listed

One of the easiest ways to get your content known by the search engines is to load it into the MyYahoo part of the Yahoo! search engine.

  • Login to http://my.yahoo.com using your Yahoo! login
  • Click the ‘Add/Delete Pages’ button
  • Click the ‘Make my own..’ option (towards end of the page)
  • Tick ‘RSS Headlines (BETA)’ and click ‘Finished’ at the bottom of the page.
  • The next page will ask for your RSS feed URL - cut and paste it in and press ‘Add’

Use Your Feed

You’ve gone to all the trouble of setting up an RSS feed, you might as well use it. On this site we use it for the index pages for the documents, and also to provide cross reference information (see the SEO Tips index page where the top half is local content, the bottom half is useful information in the articles directory). The feed is also used in the sitemap to ensure it always stays up to date with the changing content on the site.

Further Reading

Once you have your RSS data feed up and running you might want to add to it. This document describes the RSS 2.0 specification in an easy to read format.

 



Displaying A RSS Feed On Your Website - Quick Guide

Tuesday, August 17th, 2004

Updated: 10 May 2005
By Dylan Downhill

This is a quick guide to displaying an RSS feed on your website. It is not meant to be extensive, it is meant to get your feet in the door of displaying content using RSS in the quickest time possible.

There is a Quick Guide to Publishing RSS Feed available to quickly get your content available through a RSS feed.

Displaying RSS Data

The two main ways to display RSS data on a website is either through client side javascript or through server side scripting. The advantage of client side javascript is it offloads the processing to the site visitor, the disadvantage is the search engines don’t run client side code and so all your syndicated RSS content will not be indexed. From a search engine optimization point of view it is best to render the RSS data feed using server side scripts.

RSS Data Display Using Server Side Scripts

There are many RSS parsers available for free on the internet. The two major PHP based RSS parsers available for use are CaRP and MagpieRSS. I personally found the documentation for CaRP lacking and I felt I didn’t have enough control over the way the data was output. As this is a quick guide I will walk through using MagpieRSS to display an RSS data feed on your site.

The first thing to do is download the latest copy of MagpieRSS. When you’ve downloaded it, if you’re using WinZip you’ll hit an issue with the TAR file having an unrecognized extension. The way around this is to extract the TAR file to a temporary location and rename the extension to ‘.tar’ , then double-click on the file and WinZip will correctly recognize the TAR file and give you the ability to extract the files.

Extract the MagpieRSS code into a directory on your PC and upload to your web server (remember to keep the folder structure as you find it in the TAR file).

Make sure the page you want the feed displayed on is a PHP file (usually designated with a .php extension). Then include the following code:

<?php
require_once ‘../magpierss/rss_fetch.inc’;

$url = ‘http://www.elixirsystems.com/seo_tips/seo_tips.rss’ ;
$rss = fetch_rss($url);

echo ‘Site: ‘ , $rss->channel['title'], ‘ <br / >’;
if ( $rss and !$rss->ERROR) {
    foreach ($rss->items as $item ) {
        echo ‘ <p><a href=”‘ . $item[link] . ‘”>’ . $item[title] . ‘ </a><br / >’;
        echo ‘Publish Date: ‘ . $item[pubdate] . ‘ <br / >’;
        echo $item[ description ] . ‘ </p>’ ;
    }
} else {
    echo ‘RSS Error: ‘ . $rss->ERROR . ‘ <br / ><br />’ ;
}
?>

The resultant page should look like the SEO Tips index page on this site.

If the file was loaded locally then your code would change slightly as follows:

<?php
require_once ‘../magpierss/rss_fetch.inc’;

$rss_file = ‘../seo_tips/seo_tips.rss’;
$rss_string = read_file($rss_file);
$rss = new MagpieRSS( $rss_string );

echo ‘Site: ‘ , $rss->channel['title'], ‘ <br / >’;
if ( $rss and !$rss->ERROR) {
    foreach ($rss->items as $item ) {
        echo ‘ <p><a href=”‘ . $item[link] . ‘”>’ . $item[title] . ‘ </a><br / >’;
        echo ‘Publish Date: ‘ . $item[pubdate] . ‘ <br / >’;
        echo $item[ description ] . ‘ </p>’ ;
    }
} else {
    echo ‘RSS Error: ‘ . $rss->ERROR . ‘ <br / ><br />’ ;
}
?>

Note - changed lines from previous example in blue.

That’s all you need to do. If you want to change the format of the output change the foreach loop, on this site I use the full output for the index pages, and a partial output on the site map by removing the published date and the description.

RSS Data Display Using Client Side Scripts

The Jawfish application uses client side Javascript to display RSS feeds on the visitors browser. As mentioned above the data from these feed will not be indexed by the search engines and therefore your page will appear blank to the search engines. Jawfish can be downloaded here.

Syndicating Content

Our content available for syndication using RSS can be found here.

If you find these instructions helpful or if you find an error let me know.

Updates

10 May 2005 - Thanks to Gav (from Mini Tutorials) for making it XHTML compliant.