What Steve doesn't know about Search isn't worth knowing. He also has a great ability to impart his knowledge on to others in a friendly but highly professional training environment.
You are here: home Google indexing
This article contains useful tips and advice to solve common Google indexing and crawling problems.
Firstly, it's common for a low ranking website to be included in the Google index without actually being visible in the search engine results (SERP).
You can check whether your website is indexed and cached by Google, using a simple query command. To accomplish this, type site:www.mydomain.com in a Google search window, replacing "mydomain" with your registered domain name.
If Google returns the message: "sorry no information is available for the URL www.mydomain.com" then none of your website pages are Google indexed and you may have a Google indexing or crawling problem.
The Google site:www.mydomain.com query returns a list of all web pages in your domain which are indexed and cached in the Google index. If no web pages are indexed, this is often due the web domain being new or recently launched with not enough quality backlinks to make it into the Google index.
To fix a Google indexing problem (including partial indexing) first check for website navigation problems which prevent Google crawling your website.
An XML Sitemap lists all page URL's on your website and can provide additional information like "Priority" (which defines the relative importance of each page in your website hierarchy) and "Change Frequency" (which specifies how often your content is updated). This will help to encourage Googlebot to deep crawl your website and re-cache any recently updated page URL's.
Sometimes search engine indexing problems can be caused by Robots.txt file errors. Robots.txt is a small (non-mandatory) text file which is uploaded to the root directory of a web server to tell search engine robots which web pages and website assets (folders, images) should be excluded from search engine indexation.
A simple syntax error in the Robots.txt file could totally prevent Googlebot (Google's search spider or 'crawler') from indexing your website. For help read creating and formatting a Robots.txt file.
Google Webmaster Tools allows a test to be carried out on a Robots.txt file under "Site Configuration" > "Crawler Access". This can help to find Googlebot crawling problems caused by Robot exclusions.
If you're wondering how to Google index a particular page or website which is so far not indexed you may wish to try submitting your URL to Google. Google Account holders may now submit URL's for consideration to Google by visiting the Google Add URL page.
To get professional help to solve difficult Google indexing problems, contact KSL Consulting (a reasonable consultancy fee applies).
If the Google Toolbar displays Grey PageRank on a particular page it may in fact still be Google indexed (so check the page cache using the site: command described above). When Grey PR shows on the Google Toolbar a "Google does not rank current page" message will usually appear on mouse-over/hover of the PageRank indicator.
The most common causes of greyed out Google Toolbar PageRank include:
The Googlebot crawling and indexing rate is influenced by a number of factors including:
As Google has 80 billion websites to index, it aims to improve the efficiency of the Googlebot crawling process by avoiding pages which it has previously learnt rarely get updated.
Matt Cutts prepared an interesting video on how Google crawls sites and we'd recommend taking 5 minutes to watch it: Matt Cutts Googlebot crawl method video.
If Googlebot is visiting your site too often or too infrequently the crawl rate can be adjusted from Google Webmaster Tools under "Settings" > "Crawl Rate" (see below for an example).
To get access to Google Webmaster Tools requires verification of site ownership, which is accomplished either by uploading a small 'verify file' onto the web server or adding a Meta tag to the homepage HTML code.
Following the "Big Daddy" Google infrastructure update in the Spring of 2006, the crawling rate of websites is now heavily influenced by the number and quality of backlinks the site has acquired. For this reason, it is not unusual for a website with few inbound links to experience less Googlebot deep crawls.
After Google's Big Daddy update, many websites developed website indexing and Googlebot crawling rate problems. After Big Daddy, Google seems to be indexing fewer web pages, particularly on recently launched website domains and low quality sites.
Partial Google indexing is now common for websites with few inbound links. This frequently results in only the top hierarchy of pages being Google crawled and included in the Google index, with deeper internal pages (three or more clicks from the homepage) only being partially indexed or not indexed at all.
The Big Daddy update problems have long since been resolved, but many domains, even trusted sites are still left with significant numbers of non-indexed pages in the Google index. These pages would have shown up as Google Supplemental Results until the labelling of such pages was removed in the summer of 2007.
For more expert help to solve website indexing issues contact us for advice (consultancy fees apply).