Juneum
Elixir, Phoenix, and Javascript

Dynamic Sitemaps in Phoenix

March 24, 2021
trisager

A sitemap is information about a website that makes it easier for a search engine to index its contents. Having a sitemap is not a requirement - if a website has incoming links from other websites, and if the website content is well organised with internal links, a search engine crawler should have no trouble locating its pages.

On the other hand, it may take a while before a web crawler notices a new website with that does not yet have many incoming links, and large websites may contain portions that are structured so they are not easily discoverable by a crawler. Sitemaps can help in these cases.

XML Sitemaps

Sitemaps are typically written with XML, but they can also be in RSS or plain text formats. A simple XML sitemap looks something like this:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>https://juneum.com/</loc>
   </url>
  <url>
    <loc>https://juneum.com/articles/working-with-timezones</loc>
    <lastmod>2021-03-21T07:01:07Z</lastmod>
  </url>
  <url>
    <loc>https://juneum.com/articles/dynamic-sitemaps</loc>
    <lastmod>2021-03-28T09:15:12></lastmod>
</urlset>

The structure of an XML sitemap is simple: It must begin and end with <urlset> ... </urlset> tags. Inside these are a number of <url> elements, with the page URL contained in a <loc> child element.

<url> elements can have other children besides <loc> that provide additional information about the URL:

  • Date (and time if required) of last modification to the page at this URL (<lastmod>)
  • Information about how frequently the page at this URL can be expected to change (<changefreq>)
  • Priority of this URL compared to other URLs on the site (<priority>)

Most search engines ignore changefreq and priority, so you can safely omit these from your sitemap. You can also skip lastmod if you prefer to leave it up to the search engine to determine when a page was last updated.

Like with any other XML file, all the data values that you put into a sitemap, including URLs, must be properly escaped. See sitemaps.org for the details.

Creating a Dynamic XML Sitemap with Phoenix

Your sitemap could be a simple XML file served as a static asset. This has the advantage of having very little overhead when serving a large sitemap, but it requires some way of modifying the XML file when pages are created, modified, or deleted.

For smaller sites, the simplest approach is to create and serve the sitemap the same way you would serve html pages with Phoenix, ie. by adding an appropriate route to the router and then creating view and controller modules and a template file. The router could look like this:

pipeline :xml do
  plug :accepts, ["xml"]
  plug :put_layout, {MyAppWeb.LayoutView, false}
end

scope "/", MyAppWeb do
  pipe_through :xml

  get "/sitemap.xml", MyAppWeb.XMLController, :show
end

In the code above we create a new pipeline for routes that serve xml. We use two plugs in our pipeline:

  • plug :accepts, invoking Phoenix.Controller.accepts/2 to negotiate the response content type with the client
  • plug :put_layout, invoking Phoenix.Controller.put_layout/2 with the conn and a tuple with our layout module and the name of the layout file we want to use

In our case we specify the "xml" content type (which in Phoenix defaults to to the "text/xml" mime type [1]). We don't need a layout file, so we use false in place of the layout filename.

All that remains is to create a MyAppWeb.XMLView module, a sitemap.xml.eex template, and a MyAppWeb.XMLController. The controller handles the show action by getting whatever data we need from the database and rendering it using our XML template. This process is no different from rendering HTML, so we won't cover it here.

When you generate data for the lastmod tag (and other datetime values), you must ensure that they are encoded using the W3C Datetime format. Some valid examples:

  • Date only: 2021-03-23
  • Date and time in UTC: 2021-03-23T07:00:00Z
  • Date and time with time zone offset: 2021-03-23T07:00:00-05:00

Submitting a Sitemap to a Search Engine

Once a sitemap has been created, the page discovery and indexing process can be sped up by submitting the sitemap to a search engine using tools such as Google Search Console and Bing Webmaster Tools. You can also do this programatically, by sending a HTTP request to

<searchengine_URL>/ping?sitemap=sitemap_url

The url in the sitemap= query string must be URL encoded. Eg. to notify Google about a sitemap located at example.com/sitemap.xml:

curl https://google.com/ping?sitemap=https%3A%2F%2Fexample.com%2Fsitemap.xml

You will get a 200 OK response if the sitemap submission is successful.

(You can send your sitemap to Bing at bing.com/webmaster/ping.aspx?sitemap=)

For a small site, that is all there is to it. However, sitemaps are limited to a maximum of 10MB and 50,000 URLs. Beyond these limits you will have to split your sitemap into several smaller sitemaps. You can then link the individual sitemaps using a sitemap index file.


[1]. There are two commonly used XML mime types, "text/xml" and "application/xml". According to rfc 7303 these types are identical - "text/xml" can be considered an alias of "application/xml".

If, for some reason, you need to support clients that will only accept "application/xml", you can change the "xml" in the pipeline to mean "application/xml" by setting a custom mime type in your config.ex:

config :mime, :types, %{
      "application/xml" => ["xml"]
    }

You should then run mix deps.clean --build mime to force a recompile.

← Back to articles