Quantcast
Channel: DeveloperSide.NET » WAMP Developer Server
Viewing all articles
Browse latest Browse all 108

Auto Correct Broken and Mis-Encoded Links to Your Site

$
0
0

Some percentage of the posted links on the internet to your site will contain commonly made typing mistakes that result in the visitor ending up on a 404 / “Not Found” page.

By using Apache’s mod_rewrite and the RewriteRule directive, you can easily auto-correct the majority of the badly-formed incoming links, by reforming the broken links and then redirecting to the correct URL.

A normal link to your website on another website looks like this:

<p>Some text with a <a href="http://www.example.com/page">link anchor text here</a>.</p>

There are several ways this link can be malformed.

Incorrect HTML character encoding/escaping:

http://www.example.com/page&quot;&gt;...

Two links combined:

http://www.example.com/pagehttp://other.website/path

Dots, commas, quotes, parentheses, angle quotes at end:

http://www.example.com/page.

http://www.example.com/page,


http://www.example.com/page"


http://www.example.com/page'


http://www.example.com/page)


http://www.example.com/page(


http://www.example.com/page<


http://www.example.com/page>

White-spaces at end:

http://www.example.com/page <-- a space here

Link, line break, paragraph, list tags at end:

http://www.example.com/page</a>

http://www.example.com/page<br>


http://www.example.com/page</p>


http://www.example.com/page</li>

Variations of above:

http://www.example.com/page<a>

http://www.example.com/page<a/>


http://www.example.com/page<a


http://www.example.com/page</br>

http://www.example.com/page<br />

http://www.example.com/page<p>


http://www.example.com/page<p/>


http://www.example.com/page<li>

...

Fix Broken Incoming Links

To automatically correct the above common link mishaps, place the following code into either the website’s VirtualHost or .htaccess file.

# match on some common link mishaps: link">... escaped as link&quot;&gt;abcdefg
RewriteRule ^(.*)\s*(&quot;)+(&gt;)* $1 [R=permanent,L]

# match on some common link mishaps: two links merged
RewriteRule ^(.*)\s*https?:// $1 [R=permanent,L]

# match on some common link mishaps: ending tags and variations such as <br> <br/> <br /> </br> ... <a <a> <a > </a </a> ...
RewriteRule (.*)\s*</?a\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?br\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?li\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?p\ ?/?>?$ $1 [R=permanent,L]

# match on some common link mishaps: links ending with . , " ' ) ( > < or any whitespace character (on specific single match, with it being one or more times)
RewriteRule (.*)[\.,"'\)\(><\s]+$ $1 [R=permanent,L]

# match on some common link mishaps: multiple ending / (more than 1 ending forward slash)
RewriteRule (.*)//+$ $1/ [R=permanent,L]

Viewing all articles
Browse latest Browse all 108

Trending Articles