Redirects

Introduction to .htaccess Redirects

Know your redirects!

This is one of the most important chapters in this guide. If you read nothing else, understand and grasp the contents of this chapter.

Your job title does not matter. You can be an SEO, developer, system admin, or consultant. You need to understand redirects.

Why?

Redirects impact server load, site speed, indexability, crawlability, UX, and just about everything else that could break a website.

What we will cover


  1. URL mapping first
  2. mod_alias vs. mod_rewrite
  3. mod_alias basics
  4. mod_rewrite basics
  5. Redirect regular expressions and variables

URL mapping first


Before we can talk about how to create redirects we need to talk about URL mapping.

This is a guide about using the .htaccess file for SEO so we will be talking about how the Apache web server works.

What is URL mapping?

URL mapping is the process of matching a uniform resource locator (URL) to a resource (file) in the server filesystem.

Back when websites were built with static HTML pages this was simple. Today URL mapping is much more complex.

URL mapping basics

When your browser requests a URL, your query first passes to DNS servers. Once, DNS lookup has been completed the request is passed to the web server.

The web server then needs to determine what response should be served for that request. There are a number of inputs (like HTTP headers and cookies) but for simplicity, we will just look at the URL.

The most basic way for the Apache server to map URLs is to match the path and file to a directory and file in the DocumentRoot.

Example: Apache URL mapping

Your website uses the domain example.com.

The DocumentRoot for example.com on your server is set to the path /var/www/html.

A request for http://example.com/somepath/file.html would be matched to /var/www/html/somepath/file.html in the server filesystem.

This is all pretty simple so far.

It gets more complicated when resources are stored outside the DocumentRoot or when you don't want a direct URL to resource match.

For instance, what happens if you want http://example.com/somepath/file.html to look like http://example.com/somepath/file i.e. minus the .html?

How do CMSs like WordPress and Joomla serve URLs that match nothing in the server filesystem?

The answer to these questions can be found in two Apache server modules mod_alias and mod_rewrite.

mod_alias vs. mod_rewrite


Apache uses two modules to map URLs to resources that are not direct filesystem matches to the requested URL. These modules are modules mod_alias and mod_rewrite.

Each one has its advantages and uses.

Picture of Ford Mustang representing mod_rewrite and Yugo representing mod_alias

Ok, maybe I'm a little harsh on mod_alias.

mod_alias is good at basic redirects and mapping URLs to aliases. However, mod_rewrite holds all the bells and whistles you might want.

mod_alias basics


What can you do with mod_alias?

Although mod_rewrite is superior in almost every way, mod_alias has a few things that it does really well. The first is simple redirects.

If you want to redirect one URL to another, mod_alias is going to be your best option.

Let's look at some of the directives that belong to mod_alias.

Redirect Directive

### Redirect Directive Syntax
# Redirect [status] [URL-path] URL

### Example
Redirect 301 "/old-url.html" "/new-url.html"

The redirect directive syntax is as follows...

  • Redirect is the directive.
  • [status] identifies the HTTP status code to be served.
  • [URL-path] is the path of the URL to be redirected.
  • URL is the new URL to be served.

Each of these items after the Redirect directive is called an argument. Let's take a closer look at each of these arguments.

1. The status argument

Things to remember about the status argument.

The status argument is optional.

You can include the status if you want. You can also choose not to include it. However, it should be noted that the default status for the Redirect directive is 302. This means that if do not specify an HTTP status, Apache will serve a status of 302.

You can also specify the status using a keyword status argument instead of a numeric status argument.

There are four keyword status arguments that can be used.

  • permanent is the same as 301
  • temp is the same as 302
  • seeother is the same as 303
  • gone is the same as 410

The status argument accepts more than just these four options. However, depending on what status you are declaring, the URL-path and URL arguments may or may not be required. See the section about those arguments for more information.

I always specify the status, even if it is 302. It is more clear and maintainable for future SEOs and developers.

For example, these four redirect statements will have the same effect.

### These redirects are all the same
Redirect "/old-path" "/new-path"
Redirect 302 "/old-path" "/new-path"
Redirect temp "/old-path" "/new-path"
RedirectTemp "/old-path" "/new-path"

I will explain the RedirectTemp directive later in this chapter.

2. The URL-path argument

The URL-path cannot be a relative URL.

This means that you must include the slash at the beginning of the URL.

The URL-path cannot include the scheme and hostname.

Correct Incorrect
Redirect "/old-url.html" "/new-url.html"
Redirect "old-url.html" "/new-url.html"
Redirect "http://www.example/old-url.html" "/new-url.html"

3. The URL argument

The URL argument like the URL-path argument must start with a slash.

Correct Incorrect
Redirect 302 "/old-path" "/new-path"
Redirect 302 "old-path" "/new-path"

The URL argument can include the scheme and host.

Redirect "/old-url.html" "http://www.example/new-url.html"

If the status argument is between 300 and 399, the URL argument must also be present.

If the status argument is NOT between 300 and 399, the URL argument must also NOT be present.

Correct Incorrect
Redirect 301 "/old-path" "/new-path"
Redirect 302 "/old-path" "/new-path"
Redirect 403 "/old-path"
Redirect 301 "/old-path"
Redirect "/old-path"
Redirect 403 "/old-path" "/new-path"

The reason Redirect "/old-path" is incorrect is because the default status is 302. Therefore it needs the URL argument.

RedirectMatch Directive

### Redirect Directive Syntax
# RedirectMatch [status] regex URL

### Example
Redirect 301 "(.*)\.pdf$" "$1.html"

The RedirectMatch directive syntax is as follows...

  • RedirectMatch is the directive.
  • [status] identifies the HTTP status code to be served.
  • regex is the URL-path matched using a regular expression (Regex).
  • URL is the new URL to be served.

The above example will create a 301 redirect for all PDF documents to an HTML document with the same path and filename on the host.

It should be noted that RedirectMatch and Redirect are the same except for two things.

First, as already stated, RedirectMatch uses regex to match the URL-path argument.

Second, the URL argument may include a backreference to the regex to capture groups from the URL-path argument.

A backreference is $0 through $9, and it points back what is captured in the group surrounded by parenthesis.

Let's look at the example again.

Redirect 301 "(.*)\.pdf$" "$1.html"

(.*) is the first capture group. This means that it can be referenced with the $1 backreference.

We can use multiple capture groups and backreferences. However, remember that they are referenced sequentially from left to right.

Redirect 301 "/media.php?id=(.*)&file=(.*)$" "https://cdn.example.com/images/$1/$2"

This will redirect /media.php?id=168&file=file-icon.png to https://cdn.example.com/images/168/file-icon.png.

RedirectPermanent and RedirectTemp

There are two additional directives in mod_alias I want to address. They are RedirectPermanent and RedirectTemp.

RedirectPermanent is an exact equivalent to Redirect permanent.

RedirectTemp is an exact equivalent to Redirect temp.

I personally don't use either the keyword redirect method, RedirectPermanent, or RedirectTemp. I prefer to simply include the numeric status code. It looks cleaner and seems more intuitive.

mod_alias has some other important directives. However, they cannot be set in the context of the .htaccess file.

mod_rewrite basics


We were able to do a lot with Redirect and RedirectMatch in mod_alias. However, once you see what we can do with mod_rewrite you will begin to look at mod_alias a little like a Yugo.

mod_rewrite is more than a module to create redirects. It is a powerful and capable URL manipulation module. This means that it has the ability to change how URLs are mapped within the Apache web server.

mod_alias can do a lot with it's Alias and AliasMatch directives, but they cannot be set in the .htaccess context.

This means that there are a host of things we can do with mod_rewrite that we cannot do with mod_alias.

How mod_rewrite works

Although, most SEO's may not need to know all the details of how mod_rewrite works internally, it is invaluable to troubleshooting errors. It is also wise to grasp the fundamentals of how it works.

Because of the complexity of the system, I will not write out the entire process. However, I have created this flowchart to illustrate the rewrite ruleset processing.

Flowchart showing how rewrite rules are processed by mod_rewrite.

Note: the only two possible outcomes are the eventual serving of a resource, or redirecting to or proxying an external resource. This, of course, assumes that there were no errors. The end result could be a 404 error. In fact, it often is.

Interesting Factoid:

.htaccess files are executed from top to bottom. However, here is an exception to that general rule. You may have noticed it if you looked closely at the chart above.

Before mod_rewrite checks for a RewriteCond match, the URL is matched against the RewriteRule Pattern argument. If the RewriteRule Pattern matches mod_rewrite looks above the RewriteRule for RewriteCond directives. If all the RewriteCond tests are true the RewriteRule is processed.

Directive order in the .htaccess file.

RewriteCond %{REQUEST_URI} !^/update\.html
RewriteCond %{REQUEST_URI} ^(.*)$
RewriteRule ^(.*)$ https://www.example.com/update.html?path=%1 [R=302,L]

Order directives are processed.

RewriteRule ^(.*)$ https://www.example.com/update.html?path=%1 [R=302,L]
RewriteCond %{REQUEST_URI} !^/update\.html
RewriteCond %{REQUEST_URI} ^(.*)$

You can share this little gem at your next SEO office party... your welcome!

This interesting factoid does have significance. Because the RewriteRule is evaluated prior to any rewrite conditions you can use a backreference from the RewriteRule in a RewriteCond.

Basics and Syntax

Because mod_rewrite is as complex and powerful as it is, we will need to explain a few things, to begin with.

The syntax used by the rewrite module, although more complex than the alias module, is straightforward.

There are five directives you will want to know about. They are...

  • RewriteEngine
  • RewriteBase
  • RewriteOptions
  • RewriteCond
  • RewriteRule

There is an additional directive, RewriteMap. It is one of my favorites. However, it cannot be used in the .htaccess file context.

Out of all of these directives, we will spend the majority of our time looking at RewriteCond and RewriteRule. This is because they do most of the heavy lifting, and are the most customizable.

RewriteEngine directive

The first step to using mod_rewrite is to enable the runtime rewrite engine with the RewriteEngine directive. This is very simple.

### Syntax
# RewriteEngine on/off

# Turn on RewriteEngine
RewriteEngine on

# Turn off RewriteEngine
RewriteEngine off

Hint:

Don't forget that you can also turn off the RewriteEngine by setting the on/off argument to off. This can be handy if you have a large block of rewrite rules you want to disable. It is much faster than commenting out each line with #.

RewriteBase directive

It is generally good practice to declare the RewriteBase in your .htaccess files. The RewriteBase directive defines the URL prefix to be used for relative URLs.

This is important since mod_rewrite, unlike mod_alias, can accept relative URLs (URLs that don't begin with a slash).

### Syntax
# RewriteBase URL-path

### Example
RewriteBase /

The above RedirectBase would set the path for relative URLs to the DocumentRoot. In the latest Apache versions, this is almost always unnecessary. However, I still do it.

RewriteOptions Directive

To be honest, if you need this directive, you should get a server that gives you root access. It is a little beyond your typical .htaccess file.

It allows you to change the configuration of the mod_rewrite module.

### Syntax
# RewriteOptions Options

### Example
RewriteOptions InheritDownBefore

The Options argument can accept one of a number of predefined options. For instance, the above example uses InheritDownBefore. This will cause the current rewrite rules to be applied to child configurations before the child rewrite rules.

RewriteOptions can be helpful in creating maintainable and less verbose configurations. However, it is not likely that you will need to use this directive.

RewriteCond Directive

The RewriteCond directive specifies a condition under which the RewriteRule that follows it will be executed.

### Syntax
# RewriteCond TestString CondPattern [flags]

### Example
RewriteCond %{HTTP_HOST} ^example\.com [NC]

The RewriteCond syntax is as follows...

  • RewriteCond is the directive.
  • TestString identifies the string that will be tested.
  • CondPattern is the pattern that the TestString will be matched against.
  • [flags] are options that define how to the RewriteCond should be processed.

The above example uses the HTTP hostname as the TestString. (In the URL https://www.example.com/path/file.html the hostname would be www.example.com.)

Because our example CondPattern uses ^ to define the beginning of the pattern www.example.com would not match since it begins with www. not example.com.

Our example also uses the [NC] (no-case) flag. This means that it will match both example.com and Example.com.

RewriteRule Directive

The RewriteRule is the foundational element of mod_rewrite. It is what is used to manipulate URLs and create redirects.

It is similar to the RedirectMatch directive. However, it has a few major differences that will be evident as we go through it.

### Syntax
# RewriteRule Pattern Substitution [flags]

### Example
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

The RewriteRule syntax is as follows...

  • RewriteRule is the directive.
  • Pattern is a Perl compatible regular expression (PCRE) used to match against the URL-path. In the context of a .htaccess file, the Pattern is matched against a partial path, based on where the .htaccess file is located.
  • Substitution is the string that replaces URL-path that Pattern was matched against.
  • [flags] are options that define how to the RewriteRule should be processed.

I have chosen to include our previous RewriteCond in this example.

In this example, the RewriteCond tests all requested hostnames that to determine if they match the string example.com. If the test is positive the RewriteRule is processed.

This RewriteRule uses the regex capture group ^(.*)$ to capture the entire URL-path minus the possible initial slash. The captured URL-path is then appended to the string http://www.example.com/ using the backreference $1.

The last part of our RewriteRule is the two flags. L is the last flag. This tells mod_rewrite to execute this RewriteRule and return the results to the client without continuing on to the next RewriteRule. R is the redirect flag. This tells the server to redirect the old URL to the new URL not just serve the new URL resource over the old URL. The =301 parameter on the R flag identifies the type of redirect to use.

The end result of our example is that all request to example.com will be redirected to the canonical host www.example.com. This is not the best way to accomplish this. However, it serves as a good example of how RewriteCond and RewriteRule work.

Rewrite flags

The last argument for both RewriteCond and RewriteRule is [flags]. A flag is surrounded by brackets e.g. [R]. You can include more than one flag by separating them with a comma e.g. [R,L].

There are a number of common flags you will use.

[E] or [env]

You can use the [E] flag to set an environment variable. The syntax is as follows...

### Syntax
# [E=VAR:VAL]
# [E=!VAR]

### Example
RewriteRule (.*)\.(pdf)$ - [E=doc:$1]

This example would create an environment variable called doc if a PDF is requested. If white-paper.pdf was requested the doc variable would contain the value white-paper.

You can unset the doc variable with the flag [E=!doc].

[END]

The [END] flag terminates the current round of rewrite processing as well as any subsequent processing in the .htaccess file.

[F] Forbidden

The [F] flag will cause the server to return a 403 forbidden response code. This can be used to restrict access to sensitive files.

It should be noted that when using the [F] flag the [L] is implied.

[L] Last

The last [L] is used to stop processing and return the current results of the ruleset.

[N] Next

The [N] flag will cause the RewriteRule to execute again until the Pattern no longer returns a match.

RewriteRule "(.*)A(.*)" "$1a$2" [N]

This example will change run each time it finds an A and will replace it with an a.

You can also specify the maximum number of times an [N] flag can loop like this [N=8]. The next process will run only 8 times.

[NC] No Case

The [NC] flag processes the rule or condition as case insensitive. This is important any time you are handling %{HTTP_HOST}.

If you include the [NC] flag, www and Www, as well as example.com and Example.com, will be handled the same. If you don't include the [NC] flag, the CondPattern ^www\. will not match the TestString %{HTTP_HOST} if Www. is how the requested host begins.

[QSA] QS Append

The [QSA] flag will cause the query string from the request to be appended to the end of the new query string in the Substitution. This is only needed if your Substitution string contains a query string.

If you do not include the [QSA] flag and your Substitution string contains a query string, the query string from the request will be dropped.

[QSD] QS Discard

The [QSD] flag is used to remove the request query string from the target URI when doesn't contain a query string.

[R] Redirect

The [R] flag will cause the rewrite rule to be processed as a redirect. If it is not supplied, the URL will be mapped to the new location without redirecting.

You can specify the response code by using the syntax [R=NUM]. The status code may be any valid status code. For instance, you could use [R=404]. If the status code is not a 3XX status code, the Substitution string will be dropped, and the [L] flag will be implied.

[S] Skip

The [S] flag can be used to skip a specified number of RewriteRule. [S=3] will skip the next three RewriteRule.

Redirect regular expressions and variables


One of the most powerful aspects of the rewrite module is the ability to use Perl compatible regular expressions (PCRE) and environment variables.

Understanding the meaning of regex characters and variables is important to writing rulesets.

Common regex characters

Character Meaning Example
. Any single character. .ish would match fish, dish, wish, etc. but not wash
+ Repeats the previous construct one or more times. o+ would match o, ooo, ooo, etc. but not book
* Repeats the previous construct zero or more times. o* would match an empty string, o, ooo, ooo, etc. but not book
? Makes the previous construct optional. colou?r would match colour and color because the u is optional.
\ Escapes the following character \. would match . instead of any character.
^ This anchor defines the beginning of the string. ^o would match any string that begins with o.
$ This anchor defines the end of the string. o$ would match any string that ends with o.
( ) Matches a group of characters. It also creates a capture group for backreferences. .*(oo).* would match book, took, oops or any other string containing oo.
[ ] Matches any character from this Character Set. [dlf]og would match dog, log, fog, but not cog.
[^ ] Matches any character not in this Negated Set. [^c]og would match dog, log, fog, but not cog.

Common variables

Server variables are wrapped in %{}. For example, the HTTP_HOST variable is written as %{HTTP_HOST}.

Variable Value
HTTP_HOST The HTTP hostname e.g. www.example.com.
HTTP_REFERER The HTTP referer from the HTTP request header e.g. https://otherwebsite.com/somepage
HTTPS A value of on if the connection is using SSL/TLS, otherwise, it is off.
REQUEST_URI The path of the requested URL. i.e everything after the domain and before the query string or fragment.
REQUEST_SCHEME The scheme of the request. Usually, http or htpps.
REQUEST_FILENAME The same as the REQUEST_URI in the virtual host context, otherwise the local filesystem path of the resource matching the request. Most shared and could hosting use virtual hosts so it will usually be the same as the REQUEST_URI.
QUERY_STRING This is the query string from a requested URL.
REMOTE_ADDR The IP of the remote host. This is usually the IP of the visitor.

There are many more variables. However, this is a good introduction. If you need more information about Apache variables you can find that information in the Apache documentation.

Hint:

It is valuable to point out that a RedirectRule is only passed the URL-path to compare against the pattern. This means that it includes everything after the domain. However, you can use the variables %{REQUEST_SCHEME} and %{HTTP_HOST} to include the scheme and host in the pattern.

How backreference works

In the .htaccess file, a RewriteRule Pattern, as well as a RewriteCond CondPattern, can contain capture groups. A capture group is anything in a pair of parenthesis. These capture groups can be referenced to include the captured strings in the Substitution or TestString.

Backreferences to a CondPattern capture group are created with %1 through %9.

Backreferences to a Pattern capture group are created with $1 through $9.

Backreferences to a capture group within the regular expression are created with \1 through \9.

Image showing two examples of the three kinds of backreferences in rewrite rules.

In this example, the first rewrite ruleset http://www.example.com would be redirected to https://www.mydomain.com/example-com. In fact, it will redirect any requested root domain to https://www.mydomain.com/ followed by the requested domain with the dot before the TLD changed to a hyphen.

This could be handy if you managed a website that sold domains and you wanted to redirect all domain requests to an HTML page without creating individual redirects for each domain.

In the example, the second ruleset would redirect all request that map to a file that doesn't exist or is empty to an error page.

Pro Tip:

mod_rewrite only allows backreferences to other regular expressions in TestString and Substitution. This means that %1 and $1 cannot be used in either Pattern or CondPattern.

You can get around this by passing the contents of %1 or $1 along with your TestString forward with an internal backreference and a delimiter.

RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-zA-Z0-9-]+)\.([a-zA-Z]+)$ [NC]
RewriteCond %2-%3::%{REQUEST_URI} !^(.*?)::/\1?
RewriteRule ^(.*)$ https://www.mydomain.com/%2-%3 [R]

In this example, the backreferences %2 and %3 are called in the TestString. They are then captured using the (.*?) in the CondPattern. We then use \1 in the CondPattern to backreference the (.*?) capture group, which captured %2-%3.

This ultimately allows the %{REQUEST_URI} to be compared with %2-%3 which is not possible otherwise.

There is more


We have only scratched the surface of the world of redirects. The power and sophistication of mod_rewrite is a wonderful thing. It is used often in this guide to solve SEO problems.

I hope this has helped you understand the fundamentals of how redirects work in the .htaccess file.

Daniel Morell

I am a web developer and SEO with a focus on creative design, a passion for perfection, and an organic marketing green thumb.

© 2018 Daniel Morell.
+ Daniel + = this website.