How To Exclude WordPress Pages From Search Results

From time to time, each website owner wonders how to exclude WordPress pages from search results. It’s essential that search engines don’t index any technical pages or personal data – all the information that shouldn’t be publicly available.

Page Indexing And Search Results

Usually, when Google is in a process of indexing your website, the army of bots starts visiting your pages – studying them, and copying new pages to search engine’s databases.

To prevent personal information from being publicly exposed, the bots should know what they can or can’t index. Later, any user can enter keywords in the search bar and get the results.

Effective optimization and correct keywords ensure higher ranking positions of the website. And new users can become your clients or subscribers.

If not all pages of your website are indexed check how to fix it.

Why It Is So Important To Limit Page Indexing?

Each website has posts or even categories of materials to be excluded from search results. For example, information showing versions of WordPress, plugins, and themes. This data helps hackers to get access to your website easier. We’ve already discussed how to improve the website security and to remove JS and CSS versions in WordPress. If there are no authentication and access codes on your website, the private information can be exposed. If the search bots are not directed correctly, they start scanning everything on your website. You do definitely need to exclude WordPress pages from search results.

Twenty or so years ago, hackers used a search engine to collect credit card details from websites. This vulnerability helped hackers to steal users details from online stores.

Such incidents negatively affect the reputation of the brand and lead to the customer outflow and financial losses. That’s why the first thing to do is to exclude pages from the search and close them from indexing.

How To Exclude WordPress Pages From Search Results

Use robots.txt

Robots.txt is a directive for bots to include or exclude WordPress pages from search results (indexing). You can also forbid accessing and copying certain data based on the key value you enter to the file. Once you’ve done with editing, place the file to the root folder on hosting. We’ll teach you how a bit later.

What Code Should I Enter There?

Here’s a list of codes you can copy to robots.txt. You can add it in bulk or individually. The values depend on what you’d like to index or hide.

User-agent: *

Disallow: /cgi-bin          # classic...

Disallow: /?                # all request parameters on the main page

Disallow: /wp-              # all WP files: /wp-json/, /wp-includes, /wp-content/plugins

Disallow: *?s=              # search

Disallow: *&s=              # search

Disallow: /search           # search

Disallow: /author/          # the author archive

Disallow: *?attachment_id=  # the attachment page. with redirect...

Disallow: */feed            # all feeds

Disallow: */rss             # rss feed

Disallow: */embed           # all embed

Disallow: */page/           # all pagination types

Allow: */uploads            # open uploads

Allow: /*/*.js              # inside /wp- (/*/ - for priority)

Allow: /*/*.css             # inside /wp- (/*/ - for priority)

Allow: /wp-*.png            # images in plugins, the cache folder, etc.

Allow: /wp-*.jpg            # images in plugins, the cache folder, etc.

Allow: /wp-*.jpeg           # images in plugins, the cache folder, etc.

Allow: /wp-*.gif            # images in plugins, the cache folder, etc.

Allow: /wp-*.svg            # images in plugins, the cache folder, etc.

Allow: /wp-*.pdf            # files in plugins, cache folder, etc.

#Disallow: /wp/             # when WP is installed in the WP subdirectory



Sitemap: http://site.ru/sitemap.xml    

Sitemap: http://site.ru/sitemap2.xml    # another file

#Sitemap: http://site.ru/sitemap.xml.gz # compressed version (.gz)



Host: site.ru           # for Yandex and Mail.RU. (intersectional)



# Code Version: 1.0

# Don’t forget to replace `site.ru` with your website address.

Explanation:

User-agent: * allows search engines indexing your website pages. The ‘*’ symbol means that any bot from any search engine can check your website. If you want to limit indexing for one or few search engines, then list all the necessary search systems. Example:

User-agent: Yandex, User-agent: Googlebot

Allow: */uploads – this way all pages with /uploads in the URL will be indexed. This line is super useful because later we block indexing all pages starting with ‘/wp-‘. And /wp- is a part of /wp-content/uploads. Thus, you need the line Allow: */uploads changes the rule “Disallow” wp-.

The rest of the code is the Disallow rules, and they limit bots from accessing particular URLs and used to exclude WordPress pages from search results:

Disallow: /cgi-bin – close the server’s script directory



Disallow: /feed – close RSS feed



Disallow: /trackback – close notifications



Disallow: ?s= or Disallow: *?s= - close search pages



Disallow: */page/ - close all pagination types

Sitemap: http://site.ru/sitemap.xml tells a bot where to find the sitemap in XML (if there is any). If there’s no sitemap on your website, set the path to the necessary document(s). Add each document separately.

Host: site.com shows which address is the main mirror of the website. Highly important if there are several copies of the website on different domains. This rule guarantees that Yandex (the search engine) will index the mirrors equally. Why are we talking about semi-known Yandex? Because from all search engines only Yandex understands Host:. Not Google. It’s important!

If you’ve added https to the website, make sure to insert it to the website URL. Example of Host: https://sitename.com

Add the directive before or after the code in robots.txt. Use a one-line indent so it will work correctly.

Important: don’t forget to sort rules before processing.

Search engines go through Allow and Disallow directives. But they don’t follow rules from first to last. Bots go from short to long. The ‘.’ is added after the last rule is processed:

User-agent: *

Allow: */uploads

Disallow: /wp-

The system reads the code as follows:

User-agent: *

Disallow: /wp-

Allow: */uploads

For example, a bot checks the /wp-content/uploads/file.jpg URL. Then Disallow: /wp- tells a machine that no check is required, while Allow: */uploads opens any necessary page for scanning. A tip to sort the code properly: the more symbols the certain rule in robots.txt has, the sooner it will be processed. If the number of symbols is the same, the priority will be given to Allow.

Use A WordPress Plugin

We’d like to tell about the plugin that can simplify your work with robots.txt. You no longer need to create the file and move it to the root folder. Clearfy can do that for you. All you need to do is to add the necessary directives and create rules. Download and install the plugin. Go to the admin section and open plugin settings.

Here’s the path:

Settings => Clearfy menu => SEO

Find Create right robots.txt and activate it by pressing ON.

Once activated, it opens the additional field where you can add rules for robots.txt. You can copy the code for the correct indexing.

With this plugin, you don’t have to upload robots.txt to hosting and download it whenever you change something.

Conclusion

In this article, we’ve taught you how to create rules for robots.txt. Always start with creating a file and adding codes. Then place it to the root folder on hosting. You can skip this part with Clearfy. This optimization plugin can do all the job. However, you will still need to add rules manually. It means, that there is no way to exclude pages from search automatically.

You’ll notice the effect of changing robots.txt in two months. It doesn’t spread on new websites. So it’s better to improve indexing starting now.