SEO Resources

What Steve doesn't know about Search isn't worth knowing. He also has a great ability to impart his knowledge on to others in a friendly but highly professional training environment.

Andrew S.

Managing Director

SEO training courses at affordable pricesSEO training course

You are here: home Google indexing

Google Indexing Problems

Google Crawl Stats

This article contains useful tips and advice to solve common Google indexing and crawling problems.

Firstly, it's common for a low ranking website to be included in the Google index without actually being visible in the search engine results (SERP).

You can check whether your website is indexed and cached by Google, using a simple query command. To accomplish this, type site:www.mydomain.com in a Google search window, replacing "mydomain" with your registered domain name.

If Google returns the message: "sorry no information is available for the URL www.mydomain.com" then none of your website pages are Google indexed and you may have a Google indexing or crawling problem.

Check Google Cache

The Google site:www.mydomain.com query returns a list of all web pages in your domain which are indexed and cached in the Google index. If no web pages are indexed, this is often due the web domain being new or recently launched with not enough quality backlinks to make it into the Google index.

Google Site: Command Example

To fix a Google indexing problem (including partial indexing) first check for website navigation problems which prevent Google crawling your website.

If no website navigation problem is found, we recommend getting more quality links to your website from other WWW websites. You should also create an XML sitemap and submit it to Google.

An XML Sitemap lists all page URL's on your website and can provide additional information like "Priority" (which defines the relative importance of each page in your website hierarchy) and "Change Frequency" (which specifies how often your content is updated). This will help to encourage Googlebot to deep crawl your website and re-cache any recently updated page URL's.

Test Your Robots.txt File

Test Robots.txt File

Sometimes Google indexing problems can be caused by Robots.txt file errors. Robots.txt is a small (non-mandatory) text file which is uploaded to the root directory of a web server to tell search engine robots which web pages and website assets (folders, images) should be excluded from search engine indexation.

A simple syntax error in the Robots.txt file could totally prevent Googlebot (Google's search spider or 'crawler') from indexing your website. For help read creating and formatting a Robots.txt file.

Google Webmaster Tools allows a test to be carried out on a Robots.txt file under "Site Configuration" > "Crawler Access". This can help to find Googlebot crawling problems caused by Robot exclusions.

Adding a URL to the Google Index

If you're wondering how to Google index a particular page or website which is so far not indexed you may wish to try submitting your URL to Google. Google Account holders may now submit URL's for consideration to Google by visiting the Google Add URL page.

To get professional help to solve difficult Google indexing problems, contact KSL Consulting (a reasonable consultancy fee applies).

Significance of Grey Toolbar PageRank

If the Google Toolbar displays Grey PageRank on a particular page it may in fact still be Google indexed (so check the page cache using the site: command described above). When Grey PR shows on the Google Toolbar a "Google does not rank current page" message will usually appear on mouse-over/hover of the PageRank indicator.

The most common causes of greyed out Google Toolbar PageRank include:

  • Insufficient Page Rank Acquired - The page/s simply might not be getting enough "link juice". This might be the case where pages which are well down the website navigation hierarchy with no external links pointing to them. Web pages which are many clicks from the homepage may also display Grey Toolbar PageRank because they fail to acquire sufficient PageRank from other more important pages of the site.
  • Duplicate Content - Pages which are very similar to others or of low quality or little value may show Grey Google PR. Pages lacking any form of keyword focus may also suffer grey Page Rank.
  • Search Engine Exclusions - Pages which are excluded from the Google index as declared line entries in the Robots.txt file may be shown with a Greyed Out PageRank indication. These pages will typically not be Google indexed.
  • Banned Domains - Sites which have been banned from the Google index or suffered a Google Page Rank penalty may show a grey PageRank indicator. See our Google penalty page for help.

Factors Affecting Google Crawl Rate

The Googlebot crawling and indexing rate is influenced by a number of factors including:

  • Google Page Rank - The Page Rank (PR) of the website and individual pages influence crawl rate. Grey Page Rank pages will get cached far less often than pages with visible Toolbar Page Rank and the re-caching frequency will be higher for High PR pages than low PR. It is common for the Google cache of Grey Page Rank pages to only get updated once every three or four weeks.
  • Page Update Frequency - Google using intelligent crawling technology that automatically increases Googlebot activity to pages which are updated more frequently than pages which are updated infrequently.

    As Google has 80 billion websites to index, it aims to improve the efficiency of the Googlebot crawling process by avoiding pages which it has previously learnt rarely get updated.

    Matt Cutts prepared an interesting video on how Google crawls sites and we'd recommend taking 5 minutes to watch it: Matt Cutts Googlebot crawl method video.

  • Quantity and Quality of Back-Links - As Googlebot follows links between websites on the Worldwide Web, the more inbound links a site and internal page has acquired, the more frequently the site and its pages will be re-cached. Even a few additional 'deep links' from other websites to the important internal pages of your domain could help increase the crawl rate and frequency of Googlebot visits, ensuring that the Google cache of your website is updated more regularly.
  • Server Response Codes - Googlebot checks the server response code for all requests. In other words when Googlebot tries to fetch a web page it checks the response the hosting web server gives. Typical responses include "200: OK" - the page is present and is rendered; or "301: Redirect Permanent" - the page has permanently moved to another URL. The latter would result in the old page URL being removed from the Google index and the new URL indexed.

For more help and advice on acquiring additional inbound links for your website, read our advanced link building and Google SEO strategies articles.

Adjusting Google Crawling Rate in Webmaster Tools

If Googlebot is visiting your site too often or too infrequently the crawl rate can be adjusted from Google Webmaster Tools under "Settings" > "Crawl Rate" (see below for an example).

To get access to Google Webmaster Tools requires verification of site ownership, which is accomplished either by uploading a small 'verify file' onto the web server or adding a Meta tag to the homepage HTML code.

Adjusting Google Crawl settings in Webmaster Tools

Google Big Daddy Update

Following the "Big Daddy" Google infrastructure update in the Spring of 2006, the crawling rate of websites is now heavily influenced by the number and quality of backlinks the site has acquired. For this reason, it is not unusual for a website with few inbound links to experience less Googlebot deep crawls.

After Google's Big Daddy update, many websites developed website indexing and Googlebot crawling rate problems. After Big Daddy, Google seems to be indexing fewer web pages, particularly on recently launched website domains and low quality sites.

Partial Google indexing is now common for websites with few inbound links. This frequently results in only the top hierarchy of pages being Google crawled and included in the Google index, with deeper internal pages (three or more clicks from the homepage) only being partially indexed or not indexed at all.

The Big Daddy update problems have long since been resolved, but many domains, even trusted sites are still left with significant numbers of non-indexed pages in the Google index. These pages would have shown up as Google Supplemental Results until the labelling of such pages was removed in the summer of 2007.

For more expert help to solve website indexing issues contact us for advice (consultancy fees apply).

Solve Google indexing problems
Protected by Copyscape Web Copyright Protection