What is a robots.txt file?
A robots.txt file is simply an ASCII or plain text file that tells search engine crawlers which pages or files the crawler can or can’t request from a site. This is primarily used to manage crawler traffic to a site avoiding overload requests.
How do I create a robots txt file?
It is very easy to write robots.txt file. Following is the process:
- Open Notepad, Microsoft Word or any text editor and save the file as ‘robots,’ all lowercase, making sure to choose .txt as the file type extension (in Word, choose ‘Plain Text’ ).
- Next, add the following two lines of text to your file:
‘User-agent’ is another word for robots or search engine spiders. The asterisk (*) denotes that this line applies to all of the spiders. Here, there is no file or folder listed in the Disallow line, implying that every directory on your site may be accessed. This is a basic robots text file.
- Blocking the search engine spiders from your whole site is also one of the robots.txt options. To do this, add these two lines to the file:
- If you’d like to block the spiders from certain areas of your site, your robots.txt might look something like this:
The above three lines tells all robots that they are not allowed to access anything in the database and scripts directories or sub-directories. Keep in mind that only one file or folder can be used per Disallow line. You may add as many Disallow lines as you need.
- Be sure to add your search engine friendly XML sitemap file to the robots text file. This will ensure that the spiders can find your sitemap and easily index all of your site’s pages. Use this syntax:
- Once complete, save and upload your robots.txt file to the root directory of your site. For example, if your domain is www.mydomain.com, you will place the file at www.mydomain.com/robots.txt.
How do I find the robots txt of a website?
The robots. txt file must be located at the root of the website host to which it applies. For instance, to control crawling on all URLs below www.mydomain.com/ , the robots. txt file must be located at www.mydomain.com/robots.txt.
How do you check if robots txt is working?
You can submit a URL to the robots.txt Tester tool. The tool operates as Googlebot would to check your robots.txt file and verifies that your URL has been blocked properly.
Following are the steps:
- Open the tester tool for your site, and scroll through the robots.txt code to locate the highlighted syntax warnings and logic errors. The number of syntax warnings and logic errors is shown immediately below the editor.
- Type in the URL of a page on your site in the text box at the bottom of the page.
- Select the user-agent you want to simulate in the dropdown list to the right of the text box.
- Click the TEST button to test access.
- Check to see if TEST button now reads ACCEPTED or BLOCKED to find out if the URL you entered is blocked from Google web crawlers.
- Edit the file on the page and retest as necessary. Note that changes made in the page are not saved to your site! See the next step.
- Copy your changes to your robots.txt file on your site. This tool does not make changes to the actual file on your site, it only tests against the copy hosted in the tool.