From time to time, each website owner wonders how to exclude WordPress pages from search results. It’s essential that search engines don’t index any technical pages or personal data – all the information that shouldn’t be publicly available.
Page Indexing And Search Results
Usually, when Google is in a process of indexing your website, the army of bots starts visiting your pages – studying them, and copying new pages to search engine’s databases.
To prevent personal information from being publicly exposed, the bots should know what they can or can’t index. Later, any user can enter keywords in the search bar and get the results.
Effective optimization and correct keywords ensure higher ranking positions of the website. And new users can become your clients or subscribers.
If not all pages of your website are indexed check how to fix it.
Why It Is So Important To Limit Page Indexing?
Each website has posts or even categories of materials to be excluded from search results. For example, information showing versions of WordPress, plugins, and themes. This data helps hackers to get access to your website easier. We’ve already discussed how to improve the website security and to remove JS and CSS versions in WordPress. If there are no authentication and access codes on your website, the private information can be exposed. If the search bots are not directed correctly, they start scanning everything on your website. You do definitely need to exclude WordPress pages from search results.
Twenty or so years ago, hackers used a search engine to collect credit card details from websites. This vulnerability helped hackers to steal users details from online stores.
Such incidents negatively affect the reputation of the brand and lead to the customer outflow and financial losses. That’s why the first thing to do is to exclude pages from the search and close them from indexing.
How To Exclude WordPress Pages From Search Results
Use robots.txt
Robots.txt is a directive for bots to include or exclude WordPress pages from search results (indexing). You can also forbid accessing and copying certain data based on the key value you enter to the file. Once you’ve done with editing, place the file to the root folder on hosting. We’ll teach you how a bit later.
What Code Should I Enter There?
Here’s a list of codes you can copy to robots.txt. You can add it in bulk or individually. The values depend on what you’d like to index or hide.
User-agent: * Disallow: /cgi-bin # classic... Disallow: /? # all request parameters on the main page Disallow: /wp- # all WP files: /wp-json/, /wp-includes, /wp-content/plugins Disallow: *?s= # search Disallow: *&s= # search Disallow: /search # search Disallow: /author/ # the author archive Disallow: *?attachment_id= # the attachment page. with redirect... Disallow: */feed # all feeds Disallow: */rss # rss feed Disallow: */embed # all embed Disallow: */page/ # all pagination types Allow: */uploads # open uploads Allow: /*/*.js # inside /wp- (/*/ - for priority) Allow: /*/*.css # inside /wp- (/*/ - for priority) Allow: /wp-*.png # images in plugins, the cache folder, etc. Allow: /wp-*.jpg # images in plugins, the cache folder, etc. Allow: /wp-*.jpeg # images in plugins, the cache folder, etc. Allow: /wp-*.gif # images in plugins, the cache folder, etc. Allow: /wp-*.svg # images in plugins, the cache folder, etc. Allow: /wp-*.pdf # files in plugins, cache folder, etc. #Disallow: /wp/ # when WP is installed in the WP subdirectory Sitemap: http://site.ru/sitemap.xml Sitemap: http://site.ru/sitemap2.xml # another file #Sitemap: http://site.ru/sitemap.xml.gz # compressed version (.gz) Host: site.ru # for Yandex and Mail.RU. (intersectional) # Code Version: 1.0 # Don’t forget to replace `site.ru` with your website address.
Explanation:
User-agent: * allows search engines indexing your website pages. The ‘*’ symbol means that any bot from any search engine can check your website. If you want to limit indexing for one or few search engines, then list all the necessary search systems. Example:
User-agent: Yandex, User-agent: Googlebot
Allow: */uploads – this way all pages with /uploads in the URL will be indexed. This line is super useful because later we block indexing all pages starting with ‘/wp-‘. And /wp- is a part of /wp-content/uploads. Thus, you need the line Allow: */uploads changes the rule “Disallow” wp-.
The rest of the code is the Disallow rules, and they limit bots from accessing particular URLs and used to exclude WordPress pages from search results:
Disallow: /cgi-bin – close the server’s script directory Disallow: /feed – close RSS feed Disallow: /trackback – close notifications Disallow: ?s= or Disallow: *?s= - close search pages Disallow: */page/ - close all pagination types
Sitemap: http://site.ru/sitemap.xml tells a bot where to find the sitemap in XML (if there is any). If there’s no sitemap on your website, set the path to the necessary document(s). Add each document separately.
Host: site.com shows which address is the main mirror of the website. Highly important if there are several copies of the website on different domains. This rule guarantees that Yandex (the search engine) will index the mirrors equally. Why are we talking about semi-known Yandex? Because from all search engines only Yandex understands Host:. Not Google. It’s important!
If you’ve added https to the website, make sure to insert it to the website URL. Example of Host: https://sitename.com
Add the directive before or after the code in robots.txt. Use a one-line indent so it will work correctly.
Important: don’t forget to sort rules before processing.
Search engines go through Allow and Disallow directives. But they don’t follow rules from first to last. Bots go from short to long. The ‘.’ is added after the last rule is processed:
User-agent: * Allow: */uploads Disallow: /wp-
The system reads the code as follows:
User-agent: * Disallow: /wp- Allow: */uploads
For example, a bot checks the /wp-content/uploads/file.jpg URL. Then Disallow: /wp- tells a machine that no check is required, while Allow: */uploads opens any necessary page for scanning. A tip to sort the code properly: the more symbols the certain rule in robots.txt has, the sooner it will be processed. If the number of symbols is the same, the priority will be given to Allow.
Use A WordPress Plugin
We’d like to tell about the plugin that can simplify your work with robots.txt. You no longer need to create the file and move it to the root folder. Clearfy can do that for you. All you need to do is to add the necessary directives and create rules. Download and install the plugin. Go to the admin section and open plugin settings.
Here’s the path:
Settings => Clearfy menu => SEO
Find Create right robots.txt and activate it by pressing ON.
Once activated, it opens the additional field where you can add rules for robots.txt. You can copy the code for the correct indexing.
With this plugin, you don’t have to upload robots.txt to hosting and download it whenever you change something.
Conclusion
In this article, we’ve taught you how to create rules for robots.txt. Always start with creating a file and adding codes. Then place it to the root folder on hosting. You can skip this part with Clearfy. This optimization plugin can do all the job. However, you will still need to add rules manually. It means, that there is no way to exclude pages from search automatically.
You’ll notice the effect of changing robots.txt in two months. It doesn’t spread on new websites. So it’s better to improve indexing starting now.