In looking at a site with the site analysis or site audit hat on there are a number of areas that you would want to concentrate on. Below are listed some of the areas that should be analyised.
Crawling and indexing: How many pages have been indexed and how many pages are there on the site. Is there a gap (there can be small gaps at times)? Is there a big difference between the indexed and actual pages and why? Using an online crawler (or offline) like seo-browser will show you how your page appears to a search engine – which will be very different to how it’s rendered in a browser. See blog article of a real life example of this difference.
Search engine Ranking: What is the current status of your site in the various search engines? Where does it rank (if at all). It’s a fairly straightforward exercise and just has to be run off against some of the chosen keywords for benchmarking purposes.
URL Structure: The URL’s on your site are important not only for the spider visitors, but also for humans. A URL that can be “read” is infinitely better than one that’s generated by a content management system without any instructions to alter that URL. Keywords in your URL may give you a bit of a boost in the search results.
Site Structure: Similar to your URL structure you should try and make your site navigation as simple as possible. By site navigation we mean how easy is it to get from, say, the home page to your checkout page. During that excursion, is it easy to tell where exactly you are and is it easy to navigate forward and backward. This planning really should be done when the site is being designed as it’s easier to execute at that point than after. It’s still very possible to restructure a site after it’s been in the wild but it means using redirects etc which can be problematic.
Code: Although small code problems are not exactly a deal breaker but if a search engine comes across bad or poorly formed code it may abandon the page altogether. Its best to run your code through a code validator just to make sure that it’s all above board.
Page Speed: This is a big issue currently and according to Google’s representative on earth, Matt Cutts, page speed will be one of 200+ factors used in calculating page rank. The best approach to this is to either check your speed online. I like Pingdom Tools as it keeps a record of a sites performance so you can see how your efforts are being rewarded. There is plenty/a fair amount you can do to improve your pages speed. Look at the code and especially your images as they are the low hanging fruit. You can ensure that large files loaded at the early part of the page load are actually needed at that point (like css or js files) and that if they are needed should some of them be compressed (like a GZIP compression of some of the css files if your server allows it)
Page Title and Descriptions: Page titles and descriptions in a lot of search results inform human visitors what the page they could potentially click on is actually about – and are worth a big effort. There is more information on this particular topic here
Keyword Analysis We’ve looked at it here is some detail. Essentially it’s a root and branch analysis of the site looking at it from a keyword point of view
Sitemaps: There should be two types of sitemap a sitemap for the human element and another for the spiders. The latter is a xml file which you can build on line for free. Put the xml file in the root directory of the site and then make sure that in Google Webmaster Tools you submit that file to Google. Just be careful that the URL’s you use in that file are exactly the same as you use on site otherwise there may be an issue with duplicate content.
404’s: You can use many applications like xenulink to make sure that all the links on your site lead somewhere. You will also get a steer in GWT if some of your URL’s lead to a dead end. A lot of CRM systems have inbuilt 404 pages where, if a page does not exist, it throws up a simple 404 notification. This normally stops all spiders in their tracks as there is no way forward. It’s best to make a 404 page yourself, one that gives the uses some options if they reach a dead link. Many custom 404 pages have site search options etc, its something not only for the spiders to keep plugging away, but for the end users as well.
Robots.txt: This is a small file that points and guides spiders as to what they can index and what they should keep away from.