XML Sitemap

What is an XML Sitemap?

An XML (Extensible Markup Language) sitemap is a text file used to list and provide details on your website’s URLs that you want Google to read. Put very simply, XML sitemaps are tables of contents for a website: it tells Google what pages are in your site and where to find them.

An XML sitemap won’t directly impact your rankings. What it will do is help Google find its way around your site and learn more about it. In fact, Google likes using sitemaps to crawl a site that it’s often the first thing it accesses when it lands on a website. This makes XML sitemaps an important part of technical SEO.

A simple sitemap for a basic, one-page website would look like this:

   <?xml version=”1.0” encoding=”UTF-8”>

   <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:xhtml=”http:www.w3.org/1999/xhtml”>

      <url>

         <loc>https://www.example.com</loc>

         <lastmod>2017-10-06</lastmod>

         <changefreq>weekly</changefreq>

         <priority>0.9</priority>

         <xhtml:link rel=”alternate” hreflang=”en” href=”https://www.example.com”/>

         <xhtml:link rel=”alternate” hreflang=”fr” href=”https://www.example.com/fr”/>

      </url>

A basic sitemap glossary of terms

Here’s a breakdown of what each of those tags mean:

  • `<urlset>`: The sitemap opens and closes with this tag. It is the current protocol standard.
  • `<url>`: This is the parent tag for each URL entry.
  • <loc>: This tag contains the absolute URL, or the locator of the page.
  • `<lastmod>`: This contains information about the file’s last modified date. It should be in YYYY-MM-DD format.
  • `<changefreq>`: This contains information about the frequency with which a file is changed.
  • `<priority>`: The file’s importance within the site. The value ranges from 0.0 to 1.0.
  • `<xhtml:link>`: Links to alternate versions of the page available in other languages, in this case French. Note the page has an hreflang link to itself. All pages using hreflang must include self-referential links.

 

Access the full schema for XML sitemaps here.

XML sitemap guidelines

Some notes and important things to remember:

  1. The `<loc>` tag is compulsory, while the lastmod, changefreq and priority tags are optional.
  2. Page priority is relative to the rest of your site, not the rest of the web, so you can’t make them all 0.9 and expect that to do anything. Remember, when everything’s a priority, nothing is.
  3. Ideally, an XML sitemap should be added to the root directory of the website. All URLs in the sitemap must come from the same host.
  4. URLs listed in sitemaps must be canonical URLs. So no redirects or error statuses. All URLs should follow the same protocols in terms of HTTP vs. HTTPS, www vs. non-www and capitalization.
  5. URLs have a maximum character limit of 2,048 characters.
  6. Be accurate with your`<changefreq>` declaration, no matter how often you want your site crawled. Inaccurate `<changefreq>` values will be ignored.
  7. All URLs in the sitemap must come from the same host.

 

Once your sitemap has been created and added to your site, add a reference to it in your robots.txt file. It’s as simple as adding one line of code to the end of your robots.txt:

   Sitemap: https://www.example.com/sitemap-xml

Learn more about adding your sitemap to your robots.txt file here.

What is a Sitemap Index File?

If your website is huge with a lot of pages, images and videos, you’ll need to break your sitemap up into several different sitemaps. The same goes for if your website has more than one type of sitemap (more on the different types of sitemaps below).

If you do this, you’ll need to create and add something known as a sitemap index file. Here’s what a sitemap index file looks like:

   <sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>

   <sitemap>

   <loc>https://www.example.com/sitemap1.gz</loc>

   <lastmod>2017-12-31</lastmod>

   <sitemap>

   <loc>https://www.example.com/sitemap2.gz</loc>

   <lastmod>2017-10-01</lastmod>

As you can see, a sitemap index file is a list of your different sitemaps with their locations and date last modified. They’re basically sitemaps of sitemaps.

The main differences are that `<sitemapindex>` replaces `<urlset>` and `<sitemap>` replaces `<url>`. For the full schema for sitemap index files, see here.

Notice the file extension for the sitemap URLs: .gz. These are files that have been compressed using GNU zip (gzip) compression. Compressing your sitemap is a good idea to save on bandwidth when Google downloads it. However, your sitemap’s size still needs to be 10MB or less after it’s been unzipped.

Sitemap index file guidelines

Just like sitemap index file schemas are similar to regular sitemap schemas, so too are index file guidelines similar:

  • 50,000 total sitemaps in each sitemap index file
  • Max size of 10MB after decompression

What is an Image Sitemap?

If you have a site that uses a lot of images, it makes absolute sense to guide search engines to your image URLs by means of an image sitemap.

Image sitemaps look a lot like regular sitemaps. They are lists of every URL on a website, along with some extra information about those URLs. However, with image sitemaps it’s information about specific graphics, not pages. Here’s a sample of an image sitemap.

   <?xml version=”1.0” encoding=”UTF-8”>

   <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1”>

       <url>

           <loc>http://www.example.com/sample-page</loc>

           <image:image>

               <image:loc>http://www.example.com/image.jpg</image:loc>

           </image:image>

           <image:image>

               <image:loc:>http://www.exampe.com/image2.jpg</image:loc>

           </image:image>

       </url>

   </urlset>

Image sitemap glossary of terms

Image sitemaps use `<url>` and `<loc>` tags the same way as regular sitemaps. They define the start and end of each entry and the canonical version of each URL.

  • `<image:image>`: This tag notes that the information will be about an image.
  • `<image:loc>`: The URL of the image. Note that this isn’t the URL of the page that the image is on — that’s the `<loc>` tag above it. This is the URL created by your CMS to host the image. See it in action by right clicking on an image and clicking “Copy Image Address”.

Like regular sitemaps, image sitemaps also have optional tags you can use for additional information:

  • `<image:caption>`: A short caption for your image.
  • `<image:geo_location>`: The geographical location of the image. This should be the name of the place, such as a city or a landmark, not GPS coordinates.
  • `<image:title>`: Your image’s title, if it has one.
  • `<image:license>`: The URL that houses your image’s license.

Find the full schema for image sitemaps here.

What is a Video Sitemap?

There are image sitemaps, so of course there are sitemaps specifically for your videos. Like the other types of sitemaps, video sitemaps list locations of the videos on your website (whether they’re hosted there or not) and some data about them.

An entry for a video sitemaps looks like this:

   <?xml version=”1.0” encoding=”UTF-8”>

   <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1”>

       <url>

           <loc>https://www.example.com/sample-page</loc>

           <video:video>

               <video:thumbnail_loc>https://www.example.com/thumbnails/video1.jpg</video:thumbnail_loc>

               <video:title>Sample Video</video:title>

               <video:description>A short description of your video. No more than 2048 characters.</video:description>

               <video:content_loc>https://www.example.com/video/sample-video.mov</video:content_loc>

               <video:duration>10</video:duration>

           </video:video>

       </url>

Video sitemap glossary of terms

The tags listed above represent the information required for each entry:

  • `<video:thumbnail_loc>`: The URL of the video thumbnail.
  • `<video:content_loc>`: The location of the actual video file on your website. It’s like the `<image:loc>` line in the image sitemap. If you host your video externally (YouTube or Vimeo, for example), use the `<video:player_loc>` instead to point to the video file. Check your host’s embed code for this.
  • `<video:title>`: The title of the video. Simple.
  • `<video:description>`: A short description of what the video is about
  • `<video:duration>`: The video’s length in seconds, expressed as a number between 0 and 28800 (8 hours). Technically duration isn’t required, but it’s highly recommended.

However, there is more information you can give about your videos. You can add these tags to each entry if you’d like, though they are optional:

  • `<video:expiration_date>`: The date after which your video will no longer be available. Only use this if the video will actually not be accessible after this date. Put dates in YYYY-MM-DD format and times in hh:mm:ss format.
  • `<video:rating>`: The video’s rating on a scale of 0.0 to 5.0.
  • `<video:view_count>`: The number of times the video has been watched.
  • `<video:publication_date>: The date the video was first published. Note that this might be different from the date you added it to your site.
  • `<video:family_friendly>`: A yes/no question, this tag helps determine if your video will show up when SafeSearch is enabled. If you put “no”, your video will only show up with SafeSearch is turned off
  • `<video:tag>`: A very, very short description of the key concepts covered in the video. Use a new `<video:tag>` value for every tag you want to add.
  • `<video:category>`: The broad topical umbrella your video falls under, like “digital marketing” or “fashion”.
  • `<video:restriction relationship=allow/deny>`: A list of countries where your video should or should not be available. Use “allow” to restrict access to a few countries. Use “deny” to block access to certain countries. Leaving this information out will make your video available globally.
  • `<video:gallery_loc>`: The URL for any gallery or collection your video is a part of. Each video entry can have only one gallery location specified.
  • `<video:price currency=” ”>`: The price to download the video, if there is one. If your video has a price, you must also specify a currency using the ISO 4217 code. Videos can have multiple prices based on currency and resolution.
  • `<video:requires_subscription>`: Use “yes” or “no” to indicate whether or not a subscription is required to view the video.
  • `<video:uploader>`: If you’ve embedded a video from somewhere else, put the website’s domain here. Note that it must be the same domain as the one used in player_loc.
  • `<video:platform_relationship=allow/deny>`: The platforms — web, mobile and TV — that the video can be restricted to or denied on. Use “allow” to make the list inclusive, or “deny” to make it exclusive.
  • `<video:live>`: Whether or not the video is a live stream. Only “yes” or “no” are accepted.

Click here for the full video sitemap schema.

What is a Mobile Sitemap?

Technically you can use a `<mobile>` tag to indicate that a particular URL contains content for mobile devices.

However, it is really recommended that you not create a mobile sitemap or use the mobile tag. These sitemaps were intended to be used for feature phones — phones that don’t have normal browsers. As in, old flip phones. However, smartphones have browsers just as capable as desktops.

Ideally, your pages are mobile friendly and are therefore already contain content for mobile devices. So your normal sitemap functions as a mobile sitemap.

This bit of advice has come straight from Google, so you can take their word for it.

What is a Google News Sitemap?

Google News sitemaps give Google the location of and information about news and newsworthy content on your website. In order for Google to read this sitemap correctly, you have to ask to be included in Google News first. Technically any website can create and submit a Google News sitemap, but they will be most effective for sites with dynamic and fresh content that’s accessible in only one or two clicks from any homepage.

Essentially, content publishers.

An entry for a News sitemap would look like this:

   <?xml version=”1.0” encoding=”UTF-8”?>

   <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:news=”https://www.google.com/schemas/sitemap-news/0.9”>

      <url>

          <loc>https://www.example.com/news/sample-category.html</loc>

          <news:news>

               <news:publication>

                   <news:name>Example Publisher Website</news:name>

                   <news:language>en</news:language>

               </news:publication>

               <news:genres>PressRelease, Blog</news:genre>

               <news:publication_date>2017-12-31</news:publication_date>

               <news:title>Company A is Buying Company B.</news:title>

               <news:keywords>business, acquisition, Company A, Company B</news:keywords>

               <news:stock_tickers>NASDAQ:A, NASDAQ:B</news:stock_tickers>

          </news:news>

      </url>

   </urlset>      

Google News sitemap glossary of terms

Like the other types of sitemaps, News sitemaps contain parameters for `<urlset>`, `<url>` and `<loc>`. Those tags are used in the same way.

The tags specifically for news articles are:

  • `<news:news>`: This denotes that the following information will be about a news article.
  • `<news:publication>`: This defines where (`<news:name>`) the content was published, and in what language (`<news:language>`). All three of these tags must be used in a News sitemap. Language content must be in the ISO 639 language code of 2 or 3 letters. Note the publication name must match the name that will appear in the Google News results.
  • `<news:publication_date>`: The date that the article is published on your website (not added to your sitemap). Dates should be written in YYYY-MM-DD format. You can publish the exact time using YYYY-MM-DDThh:mm:ssTZD format.
  • `<news:title>`: This is the title of the article. It doesn’t need to be a 100% match — you can shorten it a bit to save space when showing it in search results. But don’t include anything extra beyond what’s on the website.
  • `<news:keywords>`: A comma-separated list that describes the article’s topic. These should be the specific topic of the article, in our example companies A and B, as well as categories — business and acquisitions. The keywords tag is optional, so you don’t have to use it. And while there’s technically no limit on the number of keywords you can use, try to keep it to a minimum.
  • `<news:stock_tickers>`: A comma separated list of the business entities referenced in the article, expressed as their stock symbol. This tag is optional, so there’s no need to include it if your article isn’t about business or finance. If you do add a stock symbol, but the exchange in front of it (NASDAQ, NYSE, etc.).

See here for the full Google News sitemap schema.

Google News sitemap guidelines

While the structure of a News sitemap is very similar to other types of sitemaps we’ve discussed, there are some additional guidelines you must follow for it to be valid:

  • Only include articles that have been published in the last 2 days. Once an article has been published for 2 days, remove it from the News sitemap. But don’t worry! Your content won’t be removed from the News index until the normal 30-day period.
  • Update your News sitemap as you publish articles. Google accesses it every time it crawls your site, so it will help it find your new pages faster.
  • When you add a new article, simply update your existing sitemap. Don’t create a new one with every new article. The same goes for when you remove an article when it reaches that 2-day threshold.
  • Include up to 1,000 articles per sitemap, but no more. If you have more than 1,000 articles, use a sitemap index file.

What are the SEO benefits of an XML sitemap?

So if sitemaps aren’t a ranking signal, what’s the point? Why have one?

Sitemaps will benefit your site in ways that will help it rank:

  • XML sitemaps tell Google what pages to crawl and index on your site.
  • XML sitemaps tell Google what sort of pages are on your site and what sort of content it will find.
  • XML sitemaps tell Google how new your content is — and Google likes fresh content.
  • XML sitemaps tell Google what pages on your site are most important and worthy of being crawled.
  • XML sitemaps help Google find pages that it might not find through internal linking.
  • XML sitemaps help Google find pages that it can’t find through external links — a big benefit for new websites.
  • XML sitemaps help Google use its crawl budget more efficiently, so larger websites can be indexed more effectively.
  • XML sitemaps point Google to all the pages on your site, even ones buried deep in the architecture.

Now, can all these things happen without an XML sitemap? Sure, maybe.

But using a sitemap makes them much, much more likely and much, much easier.

How do You Create an XML Sitemap?

There are plenty of tools out there you can use to create an XML sitemap for your website.

CMS plugins for generating XML Sitemaps

Many of the web’s top CMS platforms out there will create an XML sitemap for your, or allow you to use a plugin that will do it automatically:

Validate your sitemap

Once you have created your website’s XML sitemap, you need to validate it for errors that will prevent Google from accessing your sitemap or its listed URLs.

 

You can use one of these tools to validate your sitemap:

 

Or you can go right to Google to check your sitemap using Google Search Console’s sitemap testing tool. To test your sitemap, go to the Sitemaps report under the Crawl section in GSC:

Then click the big red “Add/Test Sitemap” button in the upper right of the screen:

 

Enter the path where you’ve loaded your sitemap:

 

Google will crawl your sitemap and return a list of errors it finds:

  • URLs not accessible: Google was unable to access the URL for some reason. That could mean it’s disallowed via robots.txt, a meta robots tag or the page just doesn’t work.
  • URLs not followed: Google didn’t follow the URL all the way to its destination because there were too many redirects, redirects that didn’t work properly, or you used relative links in your sitemap.
  • URL not allowed: The URL listed in the sitemap is at a higher or parallel level than where you host your sitemap. By “higher level,” we mean closer to your domain in your URL. So a sitemap hosted at `https://www.example.com/mysite/sitemap.xml` would not be able to contain URLs directly on the domain. Parallel URLs are pages hosted at the same level as your sitemap, but in different folders. So a URL like `https://www.example.com/thissite/page.html` would not be valid for your sitemap in the /mysite/ folder. This error will also occur if your sitemap URL doesn’t match the URLs in the sitemap when it comes to HTTPs vs. HTTP or www or non.
  • Compression error: Google wasn’t able to fully uncompress your sitemap. Try recompressing it, readding it to your site and resubmitting it.
  • Empty sitemap: The file you added as an XML sitemap doesn’t contain anything. It’s empty.
  • Sitemap file size error: Google was unable to open your sitemap because it was larger than 50MB when uncompressed. If you encounter this error, break up your sitemap and use a sitemap index file.
  • Invalid attribute error: One of your entries either uses a tag that isn’t recognized or a tag contains a value that isn’t valid for that tag.
  • Invalid date: One or more entry contains an invalid date. This could be poor formatting (something other than YYY-MM-DD) or it could be that the date itself is invalid (it’s in the future). You don’t have to specify times, but if you do, you must include a timezone.
  • Invalid URL: An entry contained a URL that contains unsupported characters, spaces or invalid characters like a comma. Or it could be improperly formatted (`htps` vs. `https`).
  • Invalid URL in sitemap index file: Your sitemap index file uses incomplete or relative URLs. Google will look for sitemaps at the same path as the sitemap index file, so if your sitemap index file is at `https://www.example.com/folder/sitemap_index.xml` and lists sitemaps located at `sitemap1.xml`, Google will look for a sitemap at `https://www.example.com/folder/sitemap1.xml`.
  • Invalid XML: too many tags: One or more entry in your sitemap contains duplicate tags.
  • Missing XML attribute: An entry in your sitemap is missing a required attribute.
  • Missing XML tag: One or more URL is missing a required tag.
  • Missing thumbnail URL: An entry in your video sitemap is missing an attribute for the `<video:thumbnail_loc>` tag.
  • Missing video title: An entry in your video sitemap is missing an attribute for the `<video:title>` tag.
  • Incorrect sitemap index format: Nested sitemap indexes: Your sitemap index file lists the URL of another sitemap index file, or its own URL. SItemap index files can only list the URLs of sitemaps.
  • Parsing error: Google couldn’t parse the sitemap’s XML. This is usually caused by unescaped characters in URLs.
  • Temporary error: Something went wrong on Google’s end that prevented it from fully processing the sitemap. Normally you won’t have to resubmit your sitemap — Google will come back and crawl it when the error has been resolved.
  • Too many sitemaps in sitemap index file: Your sitemap index filed has more than 50,000 sitemaps.
  • Too many URLs in sitemap: Your sitemap has more than 50,000 URLs.
  • Unsupported format: Your sitemap isn’t in XML.
  • Path mismatch: Missing www: Your sitemap URL contains the www prefix, but the URLs within the sitemap do not.
  • Path mismatch: Includes www: The opposite of the previous error. Your sitemap URL does not contain the www prefix but the URLs within the sitemap do.
  • Incorrect namespace: The URLs listed in the `<urlset>` contain an error.
  • HTTP error [error code]: Google encountered an HTTP error (such as 404) when trying to download your sitemap.
  • Thumbnail too large: The video specified in the `<video:thumbnail_loc>` tag is too large. Thumbnail images should be 160×120 pixels.
  • Video location and play page location are the same: The URL listed for `<video:content_loc>` and `<video:player_loc>` are the same. If you use both tags for one entry, the URLs must be different.
  • Video location URL appears to be a play URL: The URL in the `<video:content_loc>` appears to be the URL for a videoplayer.
  • Googlebot is blocked by robots.txt: Your robots.txt file is disallowing either the sitemap URL or URLs listed in your sitemap. Use the robots.txt tester to determine which applies to you.

Once you have fixed any errors uncovered by the sitemap tester, use the same tool to submit it directly to Google.

Whenever you update your Sitemap, you can resubmit it to Google using the same Add/Test Sitemap option.

ADVANCED NOTE: You can also submit your Sitemap as an HTTP request. To do this you need to issue your request to the following URL:

   <searchengineURL>/ping?sitemap=<sitemapURL>

Take a look at an example below:

   http://www.google.com/webmasters/tools/ping?sitemap=http://www.yoursite.com/sitemap.xml

URL encode the part after ping?sitemap=

   www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.xml

Issue the HTTP request using wget, curl or any other method your web developer suggests.

Sitemaps in Google Search Console

You can monitor your sitemaps and sitemap index files using Google Search Console.

This summary of your sitemaps will tell you:

  • The type of sitemap
  • Whether it has been processed and on what date
  • How many URLs it lists
  • How many URLs from the sitemap have been indexed
  • Any issues discovered while processsing

You can also use this section to download, test, resubmit or or delete any sitemap.

If you click on a sitemap index file in this area, you can see the same stats for each XML sitemap in the index file.

You will more than likely see a discrepancy between URLs submitted and URLs indexed by Google. Don’t worry too much about this. It’s perfectly normal for Google to decide not to index certain pages on your site, no matter how perfect your XML sitemap is.

But if you are struggling to get your pages indexed by Google, submitting your XML sitemap is a great way to solve this issue. Not only will get tell Google to start crawling your site, it will help you uncover any accessibility issues in your website.

So it’s safe to say that while XML sitemaps aren’t used as a ranking signal for Google, they are very important when it comes to getting your site crawled and indexed properly.

If your site already has an XML sitemap, head to Google Search Console to validate it to make sure all of your pages can be crawled. If haven’t created a sitemap yet, do so, and then submit it to Google to help it find your site.

Google Maps

Google Maps is Google‘s web-based mapping application. It plays an important role in local SEO as the three top local search results will appear in the Map Pack in local SERPs.

Structured Data

Structured data is on-page markup that is used to understand pieces of information used in web page content. Structured data allows you to annotate content to denote the context behind contact information, biographical information, company data or product data. Search engines are able to read this markup and better understand the relationship between and context behind the annotated data. Structured data is an important part of using the semantic web for SEO.

Structured data requires a taxonomy that search engines can understand – Schema.org. The most common way of adding schemas to pages is through JSON-LD code.

Cost per mille

Cost per mille refers to the price paid to get an ad shown to 1,000 people. CPM can refer to a pricing model and bidding strategy, or a calculated benchmark to determine the relative cost of an ad campaign.

The CPM model differs from the PPC model in that an advertiser pays every time an ad is loaded, regardless if it is clicked or not.

AdSense

AdSense is Google‘s advertising platform for publishers. AdSense allows website owners and publishers to monetize their websites by placing text, banner, video and rich media ads on their pages. AdSense supports PPC and CPM strategies.

While AdSense can be an effective revenue source for publishers, ad placement is a Google ranking factor. Too much AdSense, especially above the fold, makes a page look low quality and can result in lowered search rankings. Egregious abuse can result in harsh algorithmic and manual Google penalties.

 

Return on investment

ROI is the measurement of the efficiency of an investment. It’s a way to evaluate how profitable your business and marketing activities are. It is calculated by dividing profit by cost. Or, as a formula: ROI = (revenue-cost)/cost. It is usually expressed as a percentage.

Document type declaration

An instruction that associates a web page with a document type definition. Doctype helps browsers determine how to load the page. It is not a ranking factor.

HTTP

HyperText Transfer Protocol is used by the World Wide Web to determine how messages are formatted and communicated. The protocol also instructs web servers and browsers with various commands on how they should respond to the data sent.

HTTPS

A technology that encrypts the connection between a server and a browser. It’s what allows websites and browsers to securely send payment, login and other personal information. Google sees HTTPS in URLs as a positive ranking signal.