Tuesday, February 21, 2012

Have ‘/sitecore/content’ in your URLs? Time to fix it.


Alright, I’ve been wanting to blog about this for years now. Some recent posts on twitter inspired me. Finally. If you following ‘sitecore’ on twitter, you may be seeing a few tweets that apparently have nothing to do with Sitecore, it’s products and services, so when you look at the shortened URL in the tweet you see something like this:

http://www.site.com/en/sitecore/content/site1/news/blabla.aspx

The main problem with this, besides obvious aesthetics reasons, is that your news article has a very long URL which is pretty bad for SEO.

The instances of such URLs are commonly found in multi-site environments, but not necessarily.
You will have the same problem in single site if your home item was renamed to something else, and configuration was never adjusted.

The important thing is that it does not have to be this way.
Good news is that this is most likely caused by a few configuration issues. Your SEO person should be happy now knowing that your URLs could look something like this, now wouldn’t it be sweet?

http://www.site1.com/news/blabla

It’s 2012 people, great time to clean up your URLs, let’s get it started!

Before we dive into the details, let’s take a quick look at the URL anatomy. The dynamic linking mechanism is described in detail in official documentation here. The information below is given for the sake of having a context.

http://<hostname>/<LanguageCode>/<ContentTreePath>.<extension>

where:

hostname – when the request passes your web server processing and reaches Sitecore, this is how Sitecore understands which ‘site’ needs to respond with the request. This is one of the essential pieces of the multi-site capabilities.

As related reading, check out this blog post by Mark Ursino about fully qualified Sitecore URLs.

languageCode – the inclusion of the language code depends on how the LinkProvider is configured in web.config, specifically the “languageEmbedding” settings which could be set to either “asNeeded”, “always” or “never”:

<add name="sitecore" 
     type="Sitecore.Links.LinkProvider, Sitecore.Kernel"
     addAspxExtension="true"
     alwaysIncludeServerUrl="false"
     encodeNames="true"
     languageEmbedding="asNeeded"
     languageLocation="filePath"
     shortenUrls="true"
     useDisplayName="false" />

For the record, I am really not sure why there is “asNeeded” as it seems like this setting will never be needed (pun intended). Either “never” or “always” should be your choice. Anyways, if you are running in single language mode and are not planning going multi-lingual, set this to “never”.

extension – since we are on the topic of LinkProvider, the setting to control the extension is called: addAspxExtension

contentTreePath – this the part of the content tree where your pages is stored:

image

For those of you who is not familiar with inner workings of Sitecore CMS, the URLs are dynamically handled by what we call “ItemResolver”. That means that there is no static file that web server hosts (/en/sitecore/content/site1/news/blabla.aspx), rather there is an item called ‘blablba’ that is stored under /sitecore/content/site1/news as show above.

If you would like to understand the linking mechanism in more detail, check out this doc on SDN.

Now let’s see how to fix this. Here are the things you need to make sure are configured:

- Site1 in this example should have it’s own “site” definition in web.config, add it before ‘website’:

<site name="site1" hostName="site1.local.com"
      virtualFolder="/"
      physicalFolder="/"
      rootPath="/sitecore/content"
      startItem="/site1" ... />
     
<site name="website"
      virtualFolder="/"
      physicalFolder="/"
      rootPath="/sitecore/content"
      startItem="/home" ... />

notice how “startItem” points to /site1 instead of /home. Site’s start path = rootPath + startItem.

Even if you have only one site, you will need to adjust the “startItem” setting. I’d also recommend creating a dedicated site instead of modifying the “website” in this case. Separate apples and oranges.

- For the cross-site linking to work properly, the following settings should be set:

- ensure that Rendering.SiteResolving = true (if I remember correctly, this may have been ‘false’ in earlier versions):

 <setting name="Rendering.SiteResolving" value="true" />

- add targetHostName (case sensitive) to your “site1” definition:

<site name="site1"
      hostName="site1.local.com"
      targetHostName="site1.local.com" />

More about both settings in section 2.1.1 of this doc on SDN.
As the document explains, having hostName could be sufficient if you don’t have pipes or asterisks in hostName. To be on the safe side, I would recommend having targetHostName in there anyways. You never know who would be changing your configuration.

As a result, here is how the link from “Home/About” to “/Site1/News/blabla” looks like in such configuration (languageEmbedding="never" && addAspxExtension="false"):

image

image

Now everything is fixed, you have a very interesting problem to deal with. Specifically, what to do with old URLs that have /sitecore/content/site1 in them?

Well, you can either handle it with a permanent redirect solution, server based or Sitecore managed.

Another way to consider is to have a “dummy” site mapped to a non-existing item in content tree. This site will intercept all links with /sitecore/content in them and throw Sitecore item not found.

<!-- as always, the order of 'dummy' site placement is important -->
<site name="modules_website" ... />
 
<site name="dummy"
      virtualFolder="/sitecore/content"
      physicalFolder="/"
      rootPath="/sitecore/content/doesnt exist"
      startItem="" />

That’s all, folks! Please leave a comment below.

Hope this helps.

3 comments:

Gerard said...

What would be the benefit of your last advice to throw an item not found? Not very user focussed is my first thought.

Alex Shyba said...

Hi Gerard,

This is mainly for google to understand that the page does not exist. One customer of ours wanted this done, but I agree that there may be other, better ways to tell google what's up. Maybe by adding canonical url.

-alex

Sean said...

Alex, how do these settings affect the Sitecore SEO Module?
http://marketplace.sitecore.net/en/Modules/SEO-friendly_URL_module.aspx

I have just installed that module and when running it on the home page i am getting an error:

"Server Error in '/' Application.

The resource cannot be found.

Description: HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly.

Requested URL: /sitecore/content/Homeowners.aspx
"


How can I correct this?

Thank you,

Sean.