Robots.txt for MVC applications

Robots.txt is a very powerful SEO tool if it's employed properly. I use robots.txt for two purposes;

  • To avoid development sites (public ones) are mistakenly indexed by search engine bots.
  • To tell bots it's okay to crawl the site and where my sitemap is.

I don't like the idea guiding the bots what folders or pages not to be indexed. Because you may unwillingly expose your relevant URLs not only search engine bots but to people who may have evil intentions. Rather, I put no index, no follow meta tags those pages and take out all third party integration scripts (especially Google Analytic!)

<meta name="robots" content="noindex,nofollow">

If a bot respects robots.txt, chances are it also respects meta tag inside the head. Otherwise, you just expose your URLs to everybody.

Here is my implementation of dynamic robots.txt in MVC.

Step 1 - Configure Web.config

Since it's a static file, usually IIS would try to serve it and if cannot file will throw 404. So, I specify that I want to use MVC routing.

<system.webServer>
    <handlers>
      <add name="RobotsTxt" path="robots.txt" verb="GET" type="System.Web.Routing.UrlRoutingModule" resourceType="Unspecified" preCondition="integratedMode" />
    </handlers>
  </system.webServer>

Step 2 - Configure Routing (skip if you use MVC attribute routing)

If you use MVC attribute routes there is not much to do except defining a route. In case you use classic routing, here is the implementation.

routes.MapRoute("Robots.txt",
                "robots.txt",
                new { controller = "Home", action = "Robots" });

Final Step

Implement MVC action to handle robots.txt requests.

[Route("robots.txt")]
[OutputCache(Duration = int.MaxValue)]
public ContentResult RobotsTxt()
{
    var sb = new StringBuilder();
    sb.AppendLine("User-agent: *");
    if (Request.Url != null)
    {
        if (Request.Url.Host != "www.anilsezer.com")
            sb.AppendLine("Disallow: /");
    }
    sb.AppendLine("Sitemap: http://www.anilsezer.com/sitemap.xml");

    return Content(sb.ToString(), "text/plain");
}

If you want to avoid your test environment is indexed, it's important you should tell the search engine bots that you don't want to get it be indexed by using Disallow: / directive. I also cached the output since it's not changing that often.

Result

User-agent: *
Sitemap: http://www.anilsezer.com/sitemap.xml