Friday, July 20, 2012

Technical Diagnosis of issues related to SEO


A Framework for SEO Audits

On-Page

  1. Domains
  2. Navigations
  3. Sections & Categories
  4. Pages Media

Off-Page

  1. Backlinks
  2. Social Media Signals
  3. Cache Dates, crawl frequency, indexed pages
  4. Toolbar PageRank
  5. The Big 4 Factors


URLs
Site Architecture & Navigation
Deep pages (PageRank Dispersion)
Site Latency – speed of your site. This has gotten a lot of play lately.

Some Cool Tools
SEM Rush: look at a site and find the natural keywords that the site is already ranking for.
Use Google Searches: site: + inurl: /intitle:
Lynxlet/SEO-Browser.com
Charles/YSlow
Various Toolbars: Web Developer’s Toolbar, Wave Toolbar, SEOBook Toolbar
Using the command line is a great way to diagnose problems.  He mentions using Wget and there’s lots of code on the screen that I don’t understand.  My brain is suddenly yelling at me.  Adam also mentions that he uses SEOmoz’s Page Analysis Tool and LinkScape.  Because he’s super rad he wrote a post offering a free log file parsing script, that you may want to check out.

Crawling:
Full Crawl: Perform a full crawl of a site to ID each page on the site.
SE Simulation: Run consecutive crawls to simulate each engine.
Browser Crawl: View how a site is being rendered in the browser.

Goals:
ID indexation gaps and issues.
ID crawler behavior.
ID opportunities to trim the fat.

Infrastructure Issues
Template Coding: If templates aren’t bot friendly or flexible for change.
Directory Structure: Everything from parameters being added to dynamic URLs or nonexistent structure.
File Naming Conventions: Granted this isn’t the end all/be all but it’s yet another opportunity to establish theme.
Code Base: Some CMS’ generate a bunch of unnecessary code.

Performance Issues
Images, Flash, AJAX
Flash Alternatives
Semantic HTML
Frames
Image Maps
Tables
Page file size – he calls this the ‘big 2010 thing’ people will start to worry about, again it’s tied to load time.

Redirect issues:
Expiring products or content. This would include expired special offer pages or season promotions.
Internal 302 redirects or multiple redirects.
Internal JavaScript/page-based redirects
When prioritizing what to do, you have to consider impact. How will this recommendation impact our client’s business? What about ease? How easy can a client implement it? Readiness – how quickly can they implement it?  If you have an inhouse SEO team, you probably have a better sense of what types of resources you’ll need and how to allocate them.

For Ford Motor Company. They had a very Flash-heavy site. They had to do a site migration and deal with CMS troubles. They went in and collaborated with content strategy teams. They rolled out additional content and created a content migration plan.  They made sure the Flash was crawlable. Once all those things occurred they were able to up visitors by 66 percent.

Common Issues
Duplicate Content from canonical issues, mirror sites, staging sites, load balancing, pagination, non-localized international content, session IS.
Navigation Components: Maintaining the user experience. Robots.txt, XML Site maps, HTML Site maps.
Rich media & Content Accessibility
Brian Ussery is next.

Technical SEO: Images
Use detailed file names
Use keyword-relevant anchor text
Use Alt Text: Used to determine relevancy, by screen readers, people on cell phones, etc.
Place images near relevant text
Don’t place text in images
Provide info about images without spamming
Don’t block images
Use a license via CC
Use quality photos – bigger the better
Use direct names that describe your photos
Place images above the fold.
Specify width and height.
Provide as much meta data as you can about your images. Use tags, labeling, location info, etc.

Image Speed
Use the appropriate optimized image format

JPEG photos
Crush PNG for graphics
Optimized gifs for small and animated images
Don’t scale images in (X)HTML
Specify dimensions
Use a favicon with expiration to avoid 404s
It’s very difficult for the engines to extract images from Flash. Adobe doesn’t even rank top for [adobe logo] in Google Image Search. From Twitter: Use Flash like you would use cilantro – sparingly and for a single high-impact effect. Nobody wants to eat a whole bowl of cilantro. Heh.

Crawlability Intro
Spiders have limited resources. You control how easy your site is to spider. How important is crawl-rate for your Web business? TechCrunch and Mashable are crawled by the second.  Once they published an article it already ranks.  If you didn’t hear, Mashable is the new Wikipedia. Think users first, spiders second. Spiders love to crash parties. They’ll come back as much as you want them to.

Are all of your pages created equal? You control which pages are indexed.

Crawlability/Indexabilty Checklist
Convince spiders to visit often: Create static URLs that are updated frequently with the latest content and point links to them. Show the latest products that have been added. Do the same with your category pages.   Add a blog that is updated DAILY.
Show spiders where to go: Use consistent navigation that points your users and spiders to most important areas. Use breadcrumbs. ID pages that change frequently and link them directly from your top landing pages. Follow Web standards. Think of getting the user to various spots with a couple of clicks.
Block Spiders from less important content: Use Google Webmaster tools to create your robots.txt file. Don’t allow spiders to use their resources on pages that don’t need to rank.
Give the spider a map to your site: XML and user sitemaps provide a list of all URLs on your site. Just another way to help the spider around.
Feed the spider as quickly as possible: Implement server-side caching to reduce real-time database calls. Use a CDN server to increase the amount of parallel requests. GZIP the output of all text files. Limit the amount of code surrounding the content by leveraging external CSS and JS files. MINIFY the output of VSS and JS files. Use CSS sprites wherever possible.  You can use YSlow or Page Speed to see how well you’re doing.
Don’t make the spider think too hard: Since the spider is only going to allocate a certain number of resources on your site, don’t create any bottlenecks. Some common pitfalls: broken links resulting in 404s, long URLs w/ multiple parameters or session IDs, duplicate content, duplicate title/Meta tags, excessive code surround the important content, base use of Title, Meta and H1 tags.

No comments: