Some percentage of the posted links on the internet to your site will contain commonly made typing mistakes that result in the visitor ending up on a 404 / “Not Found” page.
By using Apache’s mod_rewrite and the RewriteRule
directive, you can easily auto-correct the majority of the badly-formed incoming links, by reforming the broken links and then redirecting to the correct URL.
A normal link to your website on another website looks like this:
<p>Some text with a <a href="http://www.example.com/page">link anchor text here</a>.</p>
There are several ways this link can be malformed.
Incorrect HTML character encoding/escaping:
http://www.example.com/page">...
Two links combined:
http://www.example.com/pagehttp://other.website/path
Dots, commas, quotes, parentheses, angle quotes at end:
http://www.example.com/page. http://www.example.com/page, http://www.example.com/page" http://www.example.com/page' http://www.example.com/page) http://www.example.com/page( http://www.example.com/page< http://www.example.com/page>
White-spaces at end:
http://www.example.com/page <-- a space here
Link, line break, paragraph, list tags at end:
http://www.example.com/page</a> http://www.example.com/page<br> http://www.example.com/page</p> http://www.example.com/page</li>
Variations of above:
http://www.example.com/page<a> http://www.example.com/page<a/> http://www.example.com/page<a http://www.example.com/page</br> http://www.example.com/page<br /> http://www.example.com/page<p> http://www.example.com/page<p/> http://www.example.com/page<li> ...
Fix Broken Incoming Links
To automatically correct the above common link mishaps, place the following code into either the website’s VirtualHost or .htaccess file.
# match on some common link mishaps: link">... escaped as link">abcdefg
RewriteRule ^(.*)\s*(")+(>)* $1 [R=permanent,L]
# match on some common link mishaps: two links merged
RewriteRule ^(.*)\s*https?:// $1 [R=permanent,L]
# match on some common link mishaps: ending tags and variations such as <br> <br/> <br /> </br> ... <a <a> <a > </a </a> ...
RewriteRule (.*)\s*</?a\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?br\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?li\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?p\ ?/?>?$ $1 [R=permanent,L]
# match on some common link mishaps: links ending with . , " ' ) ( > < or any whitespace character (on specific single match, with it being one or more times)
RewriteRule (.*)[\.,"'\)\(><\s]+$ $1 [R=permanent,L]
# match on some common link mishaps: multiple ending / (more than 1 ending forward slash)
RewriteRule (.*)//+$ $1/ [R=permanent,L]