Redirects

How to force https, www and a trailing slash with one redirect

Warning:

If you are implementing HSTS on your website and using the www subdomain, your site will not be eligible for the HSTS preload list if you use one redirect. You can either use two redirects or use the root domain as your primary site.

You can learn more HSTS and the www subdomain implementation here.

The rampant misuse of the .htaccess file is without a doubt my greatest pet peeve concerning .htaccess. The destruction caused by poorly written rewrite rules to correct duplicate content is a close second.

Let me explain the problem... and my problem with the typical solutions.

You can use the .htaccess file to force your site to only use https:// or only use www.. This can help ensure that your site does not have duplicate pages. Every search engine optimization (SEO) specialist knows that duplicate content is bad.

The important thing to understand is that almost every out-of-the-box CMS creates duplicate content. In fact, most can create eight different versions of any single page if your site does not have the correct rewrite rules.

The duplicate content problem

Let’s say you are the webmaster for a website and it has an SSL (Secure Socket Layer) installed. If you have not created any rewrite rules, all of the following pages probably return a valid 200 page.

  1. http://example.com/blog
  2. http://example.com/blog/
  3. http://www.example.com/blog
  4. http://www.example.com/blog/
  5. https://example.com/blog
  6. https://example.com/blog/
  7. https://www.example.com/blog
  8. https://www.example.com/blog/

The problem here is that you can have http:// or https://, on top of www. or not, and the trailing / or not. It should be noted that Google does not treat the trailing slash on the root domain as a separate page.

Google's John Mueller clarified what counts as duplicate content in this tweet.

How to fix the problem and kill your site.

You can find lots of helpful articles and forum posts on how to force https:// or how to force www. and even how to force the trailing /. But if we follow most of them we will have a new problem!

Here is a common way of fixing the problem with the .htaccess file.

## Turn on rewrite engine
RewriteEngine on

## Force WWW
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301,NC]

## Force HTTPS
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

## Remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ http://www.example.com/$1 [L,R=301]

Perfect solution... right? Wrong!

Just to show you the destruction this causes let's try it on my own poor site.

Example: Redirect Nightmare on DanielMorell.com

I will start by creating a .htaccess file in the site's root directory with the following code.

## Turn on rewrite engine
RewriteEngine on

## Force WWW
RewriteCond %{HTTP_HOST} ^danielmorell\.com [NC]
RewriteRule ^(.*)$ http://www.danielmorell.com/$1 [L,R=301,NC]

## Force HTTPS
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

## Remove trialing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ http://www.danielmorell.com/$1 [L,R=301]

Now that I have assaulted my poor server with this monstrosity, I will test it. To test it I will enter the root domain of my website without https:// or www. and with the /. The test will be to see if it changes correctly.

I type in the URL looking like this.

Image of web browser with http://danielmorell.com/ in the URL bar.

When I hit the Enter key, the page loads and the URL looks like this.

Image of web browser with https://www.danielmorell.com in the URL bar.

You are probably wondering what the big deal is. I wanted to force https://wwww. and get rid of the trailing /. If that was the result, why am I not happy?

The problem is what you don't see.

If we take a closer look at the network requests, we will find something that should be disturbing. Here are the first three file responses from the server.

Image of danielmorell.com 301 redirects caused by poorly written .htaccess file

As you can clearly see we were redirected twice. We need to take a closer look at what happened.

Image of HTTP headers showing first of multiple redirects caused by bad .htaccess rewrite rules.

You can see that the HTTP response headers give us a clear idea of what the problem is. When we hit Enter after entering http://danielmorell.com/ we are 301 redirected to http://www.danielmorell.com/.

Image of HTTP headers showing second of multiple redirects caused by bad .htaccess rewrite rules.

When the browser requests the "new location" http://www.danielmorell.com/ it is 301 redirected again to https://www.danielmorell.com/.

Image of HTTP headers showing 200 response header.

Finally, the browser tries our third location https://www.danielmorell.com/ and gets the 200 Success response.

Why was https://www.danielmorell.com/ not redirected to https://www.danielmorell.com? The reason is that it is the domain root directory. The rewrite condition for the trailing slash checks to ensure that it is not a directory. If it is a directory the last rewrite rule is ignored.

The trailing slash is removed by the browser since the response is a index.php file that has been rewritten to the domain name.

If you have been paying attention you know we have a couple problems in our example. You know that a chain of redirects is generally bad.

Redirects and PageRank

Google's Matt Cutts stated in 2013 that about 15% of PageRank is lost in a 301 redirect. This is based on Google’s concern that people would use 301’s instead of standard links so that they would pass more PageRank.

In The Anatomy of a Large-Scale Hypertextual Web Search Engine the 0.85 PageRank dampening on links was based the possibility of on a “random surfer” following the link. At its onset Google’s goal has been to rank web pages based by trying to create an algorithm that mirrors human behavior. This is why Google updated their “random surfer” model to the “reasonable surfer” model.

It is therefore no surprise that in time Google determined that 301 redirects won’t lose PageRank. A single redirect has one origin and one destination. It is not reasonable to believe that a person will browse away from a redirect once the URL that is being redirected has been requested.

It is therefore logical that in 2016 Google's Gary Illyes tweeted "30x redirects don't lose PageRank anymore."

What does this mean for redirects and SEO? Can we have as many redirects as we want? The answer is, no. Let's not kid ourselves. PageRank is not the only ranking signal that matters. If “pumping link juice” is the only thing we care about we should get out of the SEO business.

We should look at this in terms of crawlability, indexability and quality not just PageRank. Google has stated that they won’t crawl more than five redirects.

If we were, to be honest, we would admit that we move content, and create 301s for other reasons. Instead of the normal one or two, we could be looking at five to six redirects if we are not pointing to the right protocol etc.

Example: Redirect chain in the wild

Here is an example of a careless redirect chain I found out on the wild wild web.

Requested URL
http://example.com/contact/
First 301
http://www.example.com/contact/
Second 301
https://www.example.com/contact/
Third 301
http://www.example.com/contact
Fourth 301
https://www.example.com/contact
Final URL
https://www.example.com/contact

This is enough to make a good webmaster or SEO cry.

We obviously have a problem. We need a solution.

https, www, and trailing slash with a single redirect.

The way to fix this is a little advanced. We will need to change the .htaccess rewrite rules. Remember that .htaccess files like other server configuration files are read top to bottom. Our previous .htaccess rewrite rules checked first for www.. If that was not present it added it with a redirect. After that redirect it added the https:// redirect. The final redirect is to remove the trailing slash.

The right way is to make the .htaccess check for the / then check for www. and https://. If any of our desired URL parameters are incorrect we use a single RewriteRule to change the URL. This method results in only one 301 redirect.

The difficulty is that .htaccess files are not scripting files. Because of that we are limited in the simplicity of our rules.

To make our redirect work properly we also must make several adjustments to the way we check for each issue.

Step 1: We will check to ensure we are not looking at a directory.

RewriteCond %{REQUEST_FILENAME} !-d

This is important since both servers and browsers by default place a trailing / at the end of directory URLs and not at the end of files. We want to continue following that standard.

We also want to determine if the URL ends in a trailing /.

RewriteCond %{REQUEST_URI} (.+)/$

If both conditions are true we implement the following rewrite rule.

RewriteRule ^ - [S=2]

This rule will skip the next two rewrite rules. The reason we do this is that our next two rules only apply to directories (which should have the trailing slash).

Step 2: We need to check to see if www. is included in the requested URL and https:// is the protocol used.

RewriteCond %{HTTP_HOST} !^www\.(.*)$ [OR,NC]
RewriteCond %{https} off

It is important to use the no case [NC] and [OR] flags. Domains are not case sensitive. If you do not use the no case flag it may redirect on Www. since it does not match.

Step 3: We redirect all URLs that matched the conditions from Step 2 using the following redirect rule. Note that this redirect rule is skipped if the requested URL is a not a directory and has a trailing /.

RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]

This rewrite rule enforces trailing / on all URLs.

Step 4: We need to create a rewrite rule to remove the trailing slash from all URLs that were matched in step 1. Remember that we skipped the last rewrite rule and we will skip the next one. However, we need to determine if a directory is the requested URL.

RewriteCond %{REQUEST_FILENAME} -d

If it is a directory we will need to skip the final rewrite rule. We do that with a skip flag.

RewriteRule ^ - [S=1]
RewriteRule ^(.*)/$ https://www.example.com/$1 [R=301,L]

The final rewrite rule enforces no trailing / on URLs. This rewrite rule is skipped to if the conditions from step 1 are met.

Putting it all together. It should look like this.

#### Force HTTPS://WWW and remove trailing / from files ####
## Turn on rewrite engine
RewriteEngine on

## Check if not directory and ends in /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} (.+)/$
## If not a directory skip next RewriteRule
RewriteRule ^ - [S=2]

## Check if HTTPS and WWW
RewriteCond %{HTTP_HOST} !^www\.(.*)$ [OR,NC]
RewriteCond %{https} off

## This RewriteRule skipped if URI was a directory
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]

## This RewriteRule used if URI was a directory
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [S=1]
RewriteRule ^(.*)/$ https://www.example.com/$1 [R=301,L]

Let's use this on my site to see what happens.

Image of HTTP headers showing only one redirect achieved by well-written rewrite rules in .htaccess.

As you can see there is only one 301 redirect! This is better for your site than the redirect stack that many people use.

Never seen that before? You are not alone!

At the time of writing this guide some big-name websites have this problem. They include neilpatel.com, hubspot.com and searchenginewatch.com. You are in good company.

Most sites are at least using the two-redirect method. It is slightly better than the three-redirect method. It looks something like this.

#### Force HTTPS://WWW and remove trailing / from files ####

RewriteEngine on

## Force https and www
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]

## Remove trailing slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} (.+)/$
RewriteRule ^(.*)/$ https://www.example.com/$1 [R=301,L]

This method results in two redirects. This is not terrible, but why create two when you could use one?

There are other ways to reach a single redirect. You can store environment variables and use other methods. However, this method seems to work best with most configurations.

Do you need a different configuration?

You can get .htaccess configuration code samples for each variation (http vs. HTTPS and www vs. no www) in Sample Code.

Daniel Morell

I am a web developer and SEO with a focus on creative design, a passion for perfection, and an organic marketing green thumb.

© 2018 Daniel Morell.
+ Daniel + = this website.