Managing Noindex Pages and Links: An SEO Perspective

Last updated on 14.12.2023

One often overlooked aspect in SEO involves the management of ‘noindex’ pages and their associated links. Let’s delve into this topic, examining various methods and their implications.

What are "Noindex" Pages?

“Noindex” pages are those we instruct search engines not to include in their index, meaning they won’t appear in search results. This is often done through a “noindex” meta tag added in the HTML of specific pages.

  • Media Advantage: Using the “noindex” tag doesn’t affect the user experience, as it is invisible to the visitor.
  • Media Disadvantage: While it does not directly impact the website’s design or functionality, incorrect usage can negatively affect the site’s SEO by reducing its visibility in search results.

Do Links to "Noindex" Pages Affect SEO?

Yes, they can. Even though “noindex” pages won’t appear in search results, they can still consume a part of your website’s crawl budget, which refers to the number of pages a search engine will crawl on your site within a given period. Too many “noindex” pages might cause some of your important pages not getting crawled and indexed, ultimately affecting your website’s visibility in search engine results.

The "Nofollow" Solution

One solution (and, in my opinion, it’s the best) is to add a “nofollow” tag to the links leading to the “noindex” pages. This instructs search engines not to follow this link, thereby saving your crawl budget. The “nofollow” tag can be added in the HTML of your links like this:

				
					<a href="http://www.example.com/irrelevant_page" rel="nofollow">Link to irrelevant page</a>

				
			

Like the “noindex” tag, the “nofollow” attribute doesn’t alter the user experience as it’s invisible to the visitor.

Using JavaScript Links: A Viable Alternative?

Links which use JavaScript to navigate to different pages or URLs, might seem like another solution. However, as Google has evolved, it can now understand and follow JavaScript links, so this approach might not prevent Google from discovering those pages.
In aditional, over-reliance on JavaScript may cause some issues, as not all search engines can fully understand or interact with JavaScript.

Examples of JS links:

Read about how to create a link using JavaScript

				
					<button onclick="window.location.href='https://www.example.com';">Click me</button>
				
			
				
					<a href="#" onclick="window.location.href='https://www.example.com'; return false;">Click me</a>
				
			

Here are some sources that provide more information on this

  1. Google Webmaster Central Blog: Understanding web pages better – A post where Google details their improved ability to process JavaScript, including following JavaScript links.
  2. Google Webmaster Central Blog: Rendering AJAX-crawling pages – A post outlining Google’s advancements in handling JavaScript, evidenced by deprecating their AJAX crawling scheme.
  3. Google Search Central Documentation: Fix Search-related JavaScript problems – Developer documentation confirming Google’s ability to handle JavaScript links, offering guidance for optimizing JavaScript websites for Google Search.
  4. Google I/O 2018: Deliver search-friendly JavaScript-powered websites (YouTube Video) – A presentation at Google I/O 2018 discussing Google’s approach to JavaScript websites, with tips for developers.

Should We Use Robots.txt to Block "Noindex" Pages?

While using the robots.txt file to disallow crawling of certain pages can seem like a good idea, it might not be the best solution in this case.

Here’s why: The robots.txt file is a way to request robots (like Google’s crawler, Googlebot) not to crawl certain parts of your site. However, this doesn’t necessarily prevent these pages from being indexed. If other sites link to your disallowed pages, search engines might still index those pages based on the information they can gather without actually crawling them.

The “noindex” directive, on the other hand, directly tells search engines not to index a page. Even if other sites link to your “noindex” page, if a search engine respects the “noindex” directive (as Google does), it won’t include that page in its index.

Read more about blocking pages with robots.txt

Furthermore, using robots.txt to disallow crawling of these pages could potentially lead to an issue where Google has less information about your site, because it’s not able to crawl these pages to discover additional information or links.

For these reasons, I would generally recommend using a combination of “noindex, nofollow” directives instead of blocking these pages with robots.txt.

Remember, “noindex, nofollow” will both prevent these pages from being indexed and save your crawl budget by preventing search engines from following links on these pages.
As always, the best course of action depends on your specific situation, and you might want to consult with a SEO professional or do some A/B testing to determine the best approach.

What About Robots.txt + Noindex?

Using a robots.txt file to disallow search engine bots from crawling a page and also adding a “noindex” meta tag to that same page can seem like a strong approach to ensuring these pages are not included in search results. However, it can actually have counterintuitive effects.

A “noindex” tag can only be read by search engine bots if they are able to crawl the page. If you block the pages with robots.txt, the bots will not crawl these pages, and hence will not see or respect the “noindex” meta tag.

So, while your intention might be to doubly ensure these pages aren’t indexed, blocking them in robots.txt can actually prevent search engines from seeing the “noindex” directive. This can lead to a situation where the pages are not crawled because of the robots.txt directive, but could still potentially be indexed if there are links to them from other pages that are crawled and indexed.

In contrast, using a “noindex, nofollow” meta tag without disallowing these pages in robots.txt ensures that search engine bots can see and respect both the “noindex” and “nofollow” directives. It tells them not to include the pages in their index (noindex), and also not to follow the links on these pages (nofollow), thus saving your crawl budget.

The “noindex, nofollow” approach is generally more effective in ensuring these pages aren’t included in search results and their links aren’t followed, while also allowing search engine bots to see and respect these directives.

So, What is the Best Approach?

In my opinion, the most effective solution often involves using a combination of “noindex, nofollow” meta tags. This strategy ensures your pages aren’t included in search results and their links aren’t followed, allowing search engine bots to see and respect these directives.

Here’s how you can add a “noindex, nofollow” meta tag:

				
					<meta name="robots" content="noindex, nofollow">

				
			

Remember, staying informed of the latest best practices and algorithm changes is key. Regular monitoring of your site’s performance in tools like Google Search Console will help identify potential issues or shifts in performance. The trick is to remain flexible and continuously refine your SEO strategies based on data and changing guidelines.

Table of Contents