You’re Invited! 2010 SEOmoz PRO Training Series »

Posted by jennita

Yabba Dabba Doooo! Ok, I have no idea what the Flintstones has to do with SEO training, but the point is… I’m excited. It’s that time of year again when we open up registration for the PRO Training Series: Tips, Tricks and Tactics. See! Now do you get why I’m so giddy? As with every year, we have a killer lineup of speakers including Tim Ash, Dan Zarella, Laura Lippay, Wil Reynolds, Marshall Simmonds, Will Critchlow. Plus we’ve brought back the highly acclaimed "Ice Cream Break" - you wouldn’t want to miss that.


Rand and Dharmesh Shah having an ice cream treat.

Here’s the deal, last year we sold out quickly and, sadly, had to actually turn away a lot of requests. With a limit of only 310 attendees, the networking alone is going to be amazing. But that also means tickets will go quickly. The sooner you register, the better your chances of learning from top-notch speakers and networking with a unique group of advanced SEOs (Oh… and you get to hang out with JLo. heh.)

So, let’s just jump right into the details… or you can just go register. :D

Details!

Where: Westin Hotel, Seattle (This is where we had it last year as well, and it rocked!)

When: August 30 - 31, Plus an optional ½ day tools training on September 1 (65 person limit)

Price: $1149 for general attendees
           $649 for PRO Members (that’s $500 off the regular price!)
           Plus $125 for the optional tools training

Register Today

Speakers!

Remember that killer lineup I mentioned above, well here’s the full list. As you can see the speakers cover a huge spectrum of knowledge an expertise. Plus don’t forget, the tickets are limited, so it’s a lot easier to get one-on-one time with them in this setting.

The Goods!

Check out the agenda for the full two days. We’re covering topics ranging from the Science of Twitter & Google’s Algorithm to Conversion Rate Optimization and Reverse Engineering your Competitors’ Rankings. 

Agenda - Day 1

  • 9:00am - 9:45am
    It’s a Mad, Mad, Mad, Mad SERP
    Speaker: Rand Fishkin

  • 9:45am - 10:30am
    How to Win Rankings in Competitive Local/Maps Results
    Speaker: David Mihm

  • 10:30am - 10:45am: Morning Break

  • 10:45am - 11:45am
    The Science of Twitter Success
    Speaker: Dan Zarrella

  • 11:45am - 12:30pm
    Presentation Off: How to Pitch SEO
    Speaker: Will Critchlow vs. Rand Fishkin

  • 12:30pm - 1:30pm: Lunch

  • 1:30pm - 2:00pm
    Earning Direct ROI on Social Media
    Speaker: Jen Lopez

  • 2:00pm - 2:45pm
    Site Architecture + Technical Best Practices for Big Site SEO
    Speaker: Marshall Simmonds

  • 2:45pm - 3:00pm: Afternoon Break

  • 3:00pm - 4:00pm
    The Science of Google’s Algorithm
    Speaker: Ben Hendrickson + Rand Fishkin

  • 4:00pm - 4:30pm
    Constructing Effective SEO Audits
    Speaker: Lindsay Wassell

  • 4:30pm - 5:30pm
    Conversion Rate Optimization
    Speaker: Tim Ash

Agenda - Day 2

  • 9:00am - 9:45am
    10 Sites the Earned Amazing Links: How they Did It & What we Can Learn
    Speaker: Rand Fishkin

  • 9:45am - 10:30am
    Reverse Engineering Your Competitors’ Rankings
    Speaker: Wil Reynolds

  • 10:30am - 10:45am: Morning Break

  • 10:45am - 11:30am
    Manual Link Building: That’s Right; It Still Works
    Speaker: Rob Ousbey

     
  • 11:50am - 12:10pm
    Top 10 Tips for Blogging
    Speaker: Ian Lurie

  • 12:10pm - 12:30pm
    Top 10 Tips for Paid Search Optimization
    Speaker: Joanna Lord

  • 12:30pm - 1:15pm: Lunch

  • 1:15pm - 2:00pm
    Designing Your SEO Strategy
    Speaker: Laura Lippay

  • 2:00pm - 2:45pm
    Advanced Keyword Selection + Targeting
    Speaker: Tom Critchlow

  • 2:45pm - 3:30pm
    Analytics & Tracking
    Speaker: Joanna Lord

  • 3:30pm - 3:45pm: Ice Cream Break

  • 3:45pm - 4:30pm
    How to Make SEO Data Reporting Sexy
    Speaker: Will Critchlow

  • 4:30pm - 5:30pm
    No More Secrets: SEO Veterans Spill the Goods on Tactics that Work
    Speakers: Ian Lurie, Will Critchlow, Tom Critchlow, Laura Lippay, Wil Reynolds, Marshall Simmonds (moderated by Rand Fishkin)
 

 

Networking!

If you don’t believe me about the amazing networking that happens at the seminar, read How to Network at an SEOmoz Seminar After-Party from audiore. She wrote this after the seminar last year, and it’s a great guide to networking. Plus, where else can you geek out at the computer at a party? (see below)


A group geeking out at the after party

DVDs

If you can’t attend the seminar, or even if you do and just want to relive it forever, you’ll be able to purchase the training on DVD. We’ll have more information about that coming soon. :)

London Seminar - October 25-26

Don’t worry! If you’re wondering about the London Seminar, sign up below to get on the email list to learn more as soon as details are announced.

 

What Past Attendees Have Said

Here are a couple great posts from attendees last year, which were submitted to YOUmoz. Go Community!

10 Valuable, Actionable, Take-Aways From the SEOmoz Pro from Whitespark

YOUmoz Immersion at the SEOmoz Day Spa from erikellsworth

Register Today

 

Do you like this post? Yes No

You’re Invited! 2010 SEOmoz PRO Training Series »

Posted by jennita

Yabba Dabba Doooo! Ok, I have no idea what the Flintstones has to do with SEO training, but the point is… I’m excited. It’s that time of year again when we open up registration for the PRO Training Series: Tips, Tricks and Tactics. See! Now do you get why I’m so giddy? As with every year, we have a killer lineup of speakers including Tim Ash, Dan Zarella, Laura Lippay, Wil Reynolds, Marshall Simmonds, Will Critchlow. Plus we’ve brought back the highly acclaimed "Ice Cream Break" - you wouldn’t want to miss that.


Rand and Dharmesh Shah having an ice cream treat.

Here’s the deal, last year we sold out quickly and, sadly, had to actually turn away a lot of requests. With a limit of only 310 attendees, the networking alone is going to be amazing. But that also means tickets will go quickly. The sooner you register, the better your chances of learning from top-notch speakers and networking with a unique group of advanced SEOs (Oh… and you get to hang out with JLo. heh.)

So, let’s just jump right into the details… or you can just go register. :D

Details!

Where: Westin Hotel, Seattle (This is where we had it last year as well, and it rocked!)

When: August 30 - 31, Plus an optional ½ day tools training on September 1 (65 person limit)

Price: $1149 for general attendees
           $649 for PRO Members (that’s $500 off the regular price!)
           Plus $125 for the optional tools training

Register Today

Speakers!

Remember that killer lineup I mentioned above, well here’s the full list. As you can see the speakers cover a huge spectrum of knowledge an expertise. Plus don’t forget, the tickets are limited, so it’s a lot easier to get one-on-one time with them in this setting.

The Goods!

Check out the agenda for the full two days. We’re covering topics ranging from the Science of Twitter & Google’s Algorithm to Conversion Rate Optimization and Reverse Engineering your Competitors’ Rankings. 

Agenda - Day 1

  • 9:00am - 9:45am
    It’s a Mad, Mad, Mad, Mad SERP
    Speaker: Rand Fishkin

  • 9:45am - 10:30am
    How to Win Rankings in Competitive Local/Maps Results
    Speaker: David Mihm

  • 10:30am - 10:45am: Morning Break

  • 10:45am - 11:45am
    The Science of Twitter Success
    Speaker: Dan Zarrella

  • 11:45am - 12:30pm
    Presentation Off: How to Pitch SEO
    Speaker: Will Critchlow vs. Rand Fishkin

  • 12:30pm - 1:30pm: Lunch

  • 1:30pm - 2:00pm
    Earning Direct ROI on Social Media
    Speaker: Jen Lopez

  • 2:00pm - 2:45pm
    Site Architecture + Technical Best Practices for Big Site SEO
    Speaker: Marshall Simmonds

  • 2:45pm - 3:00pm: Afternoon Break

  • 3:00pm - 4:00pm
    The Science of Google’s Algorithm
    Speaker: Ben Hendrickson + Rand Fishkin

  • 4:00pm - 4:30pm
    Constructing Effective SEO Audits
    Speaker: Lindsay Wassell

  • 4:30pm - 5:30pm
    Conversion Rate Optimization
    Speaker: Tim Ash

Agenda - Day 2

  • 9:00am - 9:45am
    10 Sites the Earned Amazing Links: How they Did It & What we Can Learn
    Speaker: Rand Fishkin

  • 9:45am - 10:30am
    Reverse Engineering Your Competitors’ Rankings
    Speaker: Wil Reynolds

  • 10:30am - 10:45am: Morning Break

  • 10:45am - 11:30am
    Manual Link Building: That’s Right; It Still Works
    Speaker: Rob Ousbey

     
  • 11:50am - 12:10pm
    Top 10 Tips for Blogging
    Speaker: Ian Lurie

  • 12:10pm - 12:30pm
    Top 10 Tips for Paid Search Optimization
    Speaker: Joanna Lord

  • 12:30pm - 1:15pm: Lunch

  • 1:15pm - 2:00pm
    Designing Your SEO Strategy
    Speaker: Laura Lippay

  • 2:00pm - 2:45pm
    Advanced Keyword Selection + Targeting
    Speaker: Tom Critchlow

  • 2:45pm - 3:30pm
    Analytics & Tracking
    Speaker: Joanna Lord

  • 3:30pm - 3:45pm: Ice Cream Break

  • 3:45pm - 4:30pm
    How to Make SEO Data Reporting Sexy
    Speaker: Will Critchlow

  • 4:30pm - 5:30pm
    No More Secrets: SEO Veterans Spill the Goods on Tactics that Work
    Speakers: Ian Lurie, Will Critchlow, Tom Critchlow, Laura Lippay, Wil Reynolds, Marshall Simmonds (moderated by Rand Fishkin)
 

 

Networking!

If you don’t believe me about the amazing networking that happens at the seminar, read How to Network at an SEOmoz Seminar After-Party from audiore. She wrote this after the seminar last year, and it’s a great guide to networking. Plus, where else can you geek out at the computer at a party? (see below)


A group geeking out at the after party

DVDs

If you can’t attend the seminar, or even if you do and just want to relive it forever, you’ll be able to purchase the training on DVD. We’ll have more information about that coming soon. :)

London Seminar - October 25-26

Don’t worry! If you’re wondering about the London Seminar, sign up below to get on the email list to learn more as soon as details are announced.

 

What Past Attendees Have Said

Here are a couple great posts from attendees last year, which were submitted to YOUmoz. Go Community!

10 Valuable, Actionable, Take-Aways From the SEOmoz Pro from Whitespark

YOUmoz Immersion at the SEOmoz Day Spa from erikellsworth

Register Today

 

Do you like this post? Yes No

6 Ways PRO Can Add Value in 15 Minutes »

Posted by randfish

As many of you who read this blog know, I’m a terrible self-promoter. I actually feel guilty writing about, linking to and promoting the products and services that make payroll for the amazing SEOmoz staff and allow us to conduct cool research, produce awesome guides and build out spiffy office space. But, every few months, I manage to crawl out from under that shell. This time, it’s by request.

I’ve been hearing from a lot of our PRO members that they feel both overwhelmed and confused by all the offerings in PRO. I know it’s tough when there are 30+ pages on which unique types of PRO content exist and even the dashboard doesn’t link to all of them (that’s our fault for bad organization - I promise it’s getting better by the end of summer). Hence, this post is all about what to do in your first 15 minutes inside PRO to get lots of value that can actually move the needle on your SEO actions and search traffic.

Step 1: Find Your Big Missed Opportunities via Top Pages

 Top Pages for TripAdvisor in OSE

When you run a report in Open Site Explorer, click to the "top pages" tab and browse through the list of the most-linked-to pages on your domain. You’re looking for two things - any troubling codes (302, 40x, 50x) and pages that have lots of links, but aren’t targeting competitive keywords for relevant search traffic. In the former instance, you want to get those pages up and pointing to the right place. In the latter case, you need to run that page through OSE, determine who’s linking to it and with what anchor text (there’s a tab for that, too), then see if you can put together good content to match the links & ranking ability. You can do all that, later - for now, just export the list to CSV, or make a note to revisit.

Elapsed time: 3 minutes

Step 2: Crawl 3,000 Pages on Your Site and ID Potential Errors

 Custom Crawl Prototype

The new Custom Crawl Prototype will mimic a search engine spider and crawl up to 3,000 pages on any domain, then email you with a CSV of the results in 24 hours. It identifies duplicate content issues, HTTP headers, missing titles & meta descriptions, and many more potential SEO pitfalls. Get a report on a site or two and dig into the results tomorrow.

Elapsed time: 3 minutes 30 seconds

Step 3: Run Keyword Difficulty Reports for Your Top 5 Keyword Targets

Keyword Difficulty Tool

How tough, relatively speaking, are the keywords you’re chasing and where might easy opportunities exist? Keyword Difficulty can help answer this question and provides a terrific CSV export of the top 25 sites/pages ranking for any query with metrics for each. Often just a report or two can help you identify keyword targets where small quantities of links or optimization effort can take you a long way. They’re also ideal for showing management/clients exactly how far you have to go to catch up with the competition.

Elapsed time: 7 minutes

Step 4: Uncover Some Easy Link Targets with Link Intersect

Link Intersect Tool

Tom Critchlow and I call the Link Intersect Tool "cheating," because it’s just too easy to find good link opportunities. Plug in your site and at least 2 (up to 5) competing sites (or just sites that you think have relevant/acquirable links) and it spits back a list of sites, pages and metrics that link to 2+ of the competitors but don’t link to you. It’s like shooting links in a barrel! (that’s a thing, right?)

Elapsed time: 11 minutes

Step 5: Sign Up for a Webinar (or Download a Past Presentation)

PRO Webinars

I’ve personally run a dozen 60-90 minute webinars for our PRO members on topics ranging from "reverse engineering the SERPs" to "competitive link building" to "actionable analytics" and more. The feedback we get on these is overwhelming positive and we’re running two each month (one with a specific content focus and another reviewing members’ sites). The webinar archives contain video+audio downloads of the presentations plus a link to register for upcoming ones. If you like a more interactive/participatory learning environment, these are a great option. 

Elapsed time: 12 minutes

Step 6: Track Rankings on a Few Dozen Key Terms/Phrases

Rank Tracker

My recommendation is to Track Rankings for 10-20 key terms you’re targeting, a handful of mid-range "nice-to-haves" and a healthy helping of long-tail keywords to help give a sense of how you’re performing across the keyword demand curve. When traffic fluctuates, it’s great to be able to see if rankings were the cause, or if other factors (demand, downtime, errors, analytics capture problems, etc.) could be the culprit. The best part about the current rank tracking system is the ability to choose between multiple engines on any TLD (and to select "entire subdomain" so it catches any page from your site in the top 50 results).

Elapsed time: 15 minutes


OK, your quarter-hour is up, but so are your chances for a lot more search traffic in the next few weeks and months. When you’re ready to devote some more time, you can install the mozbar, check if any deals in the Discount Store are relevant/useful, distribute some PRO Guides to your compatriots, give Trifecta a spin, watch some PRO Whiteboard Videos, ask a question in Q+A, review the hundreds of PRO Tips, leverage the Link Acquisition Assistant to find some sexy new link opportunities, dig around in Labs, well… you get the idea.

And, as a tease, here’s an early comp of what we’ve been busy with in 2010:

Summer SEOmoz PRO Comp 

ETA: Late this summer :-)

Do you like this post? Yes No

6 Ways PRO Can Add Value in 15 Minutes »

Posted by randfish

As many of you who read this blog know, I’m a terrible self-promoter. I actually feel guilty writing about, linking to and promoting the products and services that make payroll for the amazing SEOmoz staff and allow us to conduct cool research, produce awesome guides and build out spiffy office space. But, every few months, I manage to crawl out from under that shell. This time, it’s by request.

I’ve been hearing from a lot of our PRO members that they feel both overwhelmed and confused by all the offerings in PRO. I know it’s tough when there are 30+ pages on which unique types of PRO content exist and even the dashboard doesn’t link to all of them (that’s our fault for bad organization - I promise it’s getting better by the end of summer). Hence, this post is all about what to do in your first 15 minutes inside PRO to get lots of value that can actually move the needle on your SEO actions and search traffic.

Step 1: Find Your Big Missed Opportunities via Top Pages

 Top Pages for TripAdvisor in OSE

When you run a report in Open Site Explorer, click to the "top pages" tab and browse through the list of the most-linked-to pages on your domain. You’re looking for two things - any troubling codes (302, 40x, 50x) and pages that have lots of links, but aren’t targeting competitive keywords for relevant search traffic. In the former instance, you want to get those pages up and pointing to the right place. In the latter case, you need to run that page through OSE, determine who’s linking to it and with what anchor text (there’s a tab for that, too), then see if you can put together good content to match the links & ranking ability. You can do all that, later - for now, just export the list to CSV, or make a note to revisit.

Elapsed time: 3 minutes

Step 2: Crawl 3,000 Pages on Your Site and ID Potential Errors

 Custom Crawl Prototype

The new Custom Crawl Prototype will mimic a search engine spider and crawl up to 3,000 pages on any domain, then email you with a CSV of the results in 24 hours. It identifies duplicate content issues, HTTP headers, missing titles & meta descriptions, and many more potential SEO pitfalls. Get a report on a site or two and dig into the results tomorrow.

Elapsed time: 3 minutes 30 seconds

Step 3: Run Keyword Difficulty Reports for Your Top 5 Keyword Targets

Keyword Difficulty Tool

How tough, relatively speaking, are the keywords you’re chasing and where might easy opportunities exist? Keyword Difficulty can help answer this question and provides a terrific CSV export of the top 25 sites/pages ranking for any query with metrics for each. Often just a report or two can help you identify keyword targets where small quantities of links or optimization effort can take you a long way. They’re also ideal for showing management/clients exactly how far you have to go to catch up with the competition.

Elapsed time: 7 minutes

Step 4: Uncover Some Easy Link Targets with Link Intersect

Link Intersect Tool

Tom Critchlow and I call the Link Intersect Tool "cheating," because it’s just too easy to find good link opportunities. Plug in your site and at least 2 (up to 5) competing sites (or just sites that you think have relevant/acquirable links) and it spits back a list of sites, pages and metrics that link to 2+ of the competitors but don’t link to you. It’s like shooting links in a barrel! (that’s a thing, right?)

Elapsed time: 11 minutes

Step 5: Sign Up for a Webinar (or Download a Past Presentation)

PRO Webinars

I’ve personally run a dozen 60-90 minute webinars for our PRO members on topics ranging from "reverse engineering the SERPs" to "competitive link building" to "actionable analytics" and more. The feedback we get on these is overwhelming positive and we’re running two each month (one with a specific content focus and another reviewing members’ sites). The webinar archives contain video+audio downloads of the presentations plus a link to register for upcoming ones. If you like a more interactive/participatory learning environment, these are a great option. 

Elapsed time: 12 minutes

Step 6: Track Rankings on a Few Dozen Key Terms/Phrases

Rank Tracker

My recommendation is to Track Rankings for 10-20 key terms you’re targeting, a handful of mid-range "nice-to-haves" and a healthy helping of long-tail keywords to help give a sense of how you’re performing across the keyword demand curve. When traffic fluctuates, it’s great to be able to see if rankings were the cause, or if other factors (demand, downtime, errors, analytics capture problems, etc.) could be the culprit. The best part about the current rank tracking system is the ability to choose between multiple engines on any TLD (and to select "entire subdomain" so it catches any page from your site in the top 50 results).

Elapsed time: 15 minutes


OK, your quarter-hour is up, but so are your chances for a lot more search traffic in the next few weeks and months. When you’re ready to devote some more time, you can install the mozbar, check if any deals in the Discount Store are relevant/useful, distribute some PRO Guides to your compatriots, give Trifecta a spin, watch some PRO Whiteboard Videos, ask a question in Q+A, review the hundreds of PRO Tips, leverage the Link Acquisition Assistant to find some sexy new link opportunities, dig around in Labs, well… you get the idea.

And, as a tease, here’s an early comp of what we’ve been busy with in 2010:

Summer SEOmoz PRO Comp 

ETA: Late this summer :-)

Do you like this post? Yes No

What is PageRank Good for Anyway? (Statistics Galore) »

Posted by SeanWF

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

This is my first YOUmoz post, and I would greatly appreciate your feedback. I will be actively responding to comments, and I know that we will get a great discussion going. Please comment with any critique, questions, or random thoughts that you may have. If you would rather skip the statistics, feel free to jump ahead to the discussion section.

Introduction

A couple of months ago, SEOmoz explored the relationship between a web page’s SEOmoz explored the relationship between a web page’s PageRank and its position in search results. They concluded:

Google’s PageRank is, indeed, slightly correlated with their rankings (as well as with the rankings of other major search engines). However, other page-level metrics are dramatically better, including link counts from Yahoo and Page Authority.

I was intrigued by the study, and vowed to investigate the metric using my own data set. Because all of my data are at the root domain level, I chose to focus on the homepage PageRank of each domain.

Methods

I averaged three months of data (November, 2009 - January, 2010), collected on the last day of each month for 1,316 root domains. Using Quantcast Media Planner, I selected websites that had chosen to make their traffic data public. To be included, websites had to have an average of at least 100,000 unique US visitors during this time period.

The domains selected for this study do not approximate a random sample of websites. Because of the way in which they were selected, they will bias in favor of sites with many US visitors, and against sites with very few. There may also be differences between Quantified sites with public traffic data, and non-Quantified websites. For example, Quantified domains are probably more likely to include advertising on their pages than sites without the Quantcast script.

PageRank

PageRank (PR) can only take eleven values (0-10). It is an ordinal variable meaning that the difference between PR = 8 and PR = 9 is not the same as the difference between PR = 3 and PR = 4. Like mozRank, it probably exists on a log scale.

The median and mode PageRank among websites in this study were PR = 6, with a minimum of PR = 0, and a maximum of PR = 9. However, only ten websites had PR < 3, and only seven had PR = 9.

Frequencies of PageRank Values

Results

SEOmoz Metrics

Using Spearman’s correlation coefficient, I compared PageRank to several SEOmoz root domain metrics. Domain mozRank (linearized) was strongly correlated with PR (r = 0.62)*. This correlation was somewhat smaller than the 0.71 that SEOmoz reported in May, 2009. The disparity may be due to differences in methodology; SEOmoz used Pearson’s correlation coefficient, and did not linearize mozRank. Additionally, PR data in my study were probably measured over a smaller range of values, SEOmoz reported in May, 2009. The disparity may be due to differences in methodology; SEOmoz used Pearson’s correlation coefficient, and did not linearize mozRank. Additionally, PR data in my study were probably measured over a smaller range of values, potentially weakening the observed dependencies.

*All reported correlations are significant at p < .01.

MozTrust was also highly correlated with PageRank (r = .62), with Domain Authority somewhat less-so (r = .55). The latter has since undergone some major changes, and this result may not reflect the metric as it exists today.

Search Engine Indexing

I performed [site:example.com] queries using Google, Yahoo, and Bing APIs to approximate the number of pages indexed by each search engine. Much to my surprise, PageRank shared the strongest correlation with the number of pages indexed by Bing (r = .52), instead of Google (r = .30), or Yahoo (r = .24). My first thought was that Google might not have reported accurate counts, a phenomenon often noted by SEO professionals. However, there is some evidence that may indicate otherwise.

If Google’s reported indexation numbers are inaccurate, we would expect the metric to have lower correlations with similar metrics. However, indexation numbers reported by Google and Yahoo share a fairly high Pearson’s correlation coefficient (r = 0.38). Both appear to share smaller correlations with Bing: 0.34, and 0.26 respectively. Even more interesting, SEOmoz metrics seem to have much stronger correlations with Bing’s indexed pages than the numbers reported by Google or Yahoo.

Pearson Correlations - SEOmoz Domain Metrics and Indexed Pages

If Google is failing to accurately report the size of its index, we might expect that similar queries would also return inaccurate data. However, PageRank shares a high Spearman’s correlation coefficient with the number of results returned by a Google [link:example.com] query (r = 0.65). The strength of this relationship appears similar to those between SEOmoz metrics and PR mentioned earlier. PR’s correlation with the results of a Yahoo [linkdomain:example.com -site:example.com] query is somewhat smaller (r = 0.53).

If the number of pages Google reports having indexed is a relatively poor metric, we would also expect to find more variation between months than other search engines. However, I did not find this to be the case. In fact, Bing had by far the highest average percent change in the number of pages indexed, a whopping 355% increase per month. Google averaged an increase of 61%, and Yahoo an increase of only 2%.

While it is still possible that the number of pages on each domain that Google reports to have indexed is inaccurate, I see another potential explanation. Moreso than Yahoo or Google, the number of pages that Bing will index on any given domain is related to the quantity and quality of links to that domain. Perhaps, at least when it comes to indexation, Bing follows more of a traditional PageRank-like algorithm. After all, Google claims that PR is only one of more than 200 signals used for ranking pages. This theory is supported by the results of SEOmoz’s comparison of Google’s and Bing’s ranking factors.

Social Media

PageRank even shares fairly strong correlations with social media metric such as how many of a domain’s pages are saved on Delicious (r = 0.49), how many stories it has on Digg (r = 0.38), and even the number of Tweets linking to one of its pages as measured by Topsy (r = .38).

Website Traffic

Last, but certainly not least, PageRank predicts website traffic with somewhat surprising strength. As reported by Quantcast, monthly page views, visits, and unique visitors are all significantly correlated with PR. Google’s little green bar even correlates with visits per unique visitor (r = 0.18), but not page views per visit. However, putting this in context shows the value of a metric like Domain Authority.

Correlations Between PageRank, Domain Authority and Website Traffic

Discussion

So what exactly does all of this mean, and why is it important?

First, despite being a page-level metric, homepage PageRank is actually a fairly good predictor of many important domain-level variables relevant to SEO, social media, and website traffic.

Comparison of PageRank Correlations with Metrics

For instance, on average, websites with a PR = 7 homepage had 2.6 times as many unique visitors as those with a PR = 6 homepage, which in turn had 1.5 times as many unique visitors as those with a PR = 5 homepage.

Indexed Pages and Unique Visitors by PageRank

Second, homepage PageRank is sometimes used as a proxy for a hypothetical “domain PageRank.” While technically inaccurate, this study supports the idea that the PR of a website’s homepage provides information about the domain as a whole.

While it may be limited to just eleven possible values, PR it is surprisingly good at predicting the relative number of inbound links to a domain reported by Google and Yahoo, as well as the relative number of pages indexed by Bing. The key word here is “relative.” As an ordinal variable, PR cannot be used to predict the actual values of continuous variables.

Finally, this study provides evidence that SEOmoz’s domain-level metrics may be good (and possibly better than PageRank) predictors of variables important to search, social media, and web analytics. This, as well as all of the results of this study should be interpreted within the context of the included domains (high-traffic, US-centric, and publicly Quantified).

I hope you enjoyed reading my post, because I certainly enjoyed writing it. I intend to write many more based on your feedback. If you found this post interesting or valuable, I would greatly appreciate your thumbs up by clicking the icon below.

Do you like this post? Yes No

What if My Competitors Point Spammy Links to My Site? »

Posted by randfish

After last week’s Whiteboard Friday on the penalties paid links can incur, I got several questions about whether paid/spammy links could be used as a weapon to potentially harm someone else’s rankings. In this post, I’ll walk through why this is rarely the case, how you can defend yourself from potential scenarios and why this isn’t a great tactic to employ against your competitors.

Can Paid Links Be Used as Weapons in the SERPs?

The short answer is "almost never." But, as is typical in the SEO world, there’s a lot more in the long version.

In general, it’s very, very hard to bring down a white hat site/page ranking well in the search results. Although Google isn’t perfect at catching spam (e.g. our recent video featuring the success of some very obvious paid links in a well known network), they seem to be surprisingly excellent (almost prescient) at detecting the intent of links. My suspicion is that sites who buy links to prop up their own rankings have very different patterns than those who have competitors buying links to them. These patterns exist on the sites themselves, in other sites registered to the owners, in link footprints and in usage/search behavior.

Effect of Spammy/Paid Links on Websites

It could, in fact, be that the "penalties" many SEOs often ascribe to paid links are in fact the result of a much more sophisticated analysis by Google looking at multiple aspects of a site’s presence before making a determination of the link intent. Given that, in nearly 10 years of SEO, I’ve only heard of two reasonably verifiable instances of "Google-bowling" (the process of pointing bad links at a site or page to hurt it’s rankings) working, my guess is that Google’s webspam team has developed some very impressive methods here.

Many SEOs have also suggested that a certain "bar of trust" can be achieved in Google, after which, negative links may be devalued, but likely don’t cause penalties or rankings drops. This makes a lot of sense to me (though it’s nearly impossible to prove), since "Google-bowling" is largely defeated and even good sites who stray into black/gray hat link building will simply find themselves wasting money, rather than being removed from the results (which could, for many popular brands/sites, cause a loss of relevance in the results for users).

Thus, if you are trying to wield paid links as a weapon against your ranking competitors, it’s far more likely to work against the new(ish) site ranking #65 for your keywords rather than those who’ve earned their way to the top spots with white hat techniques.

Defending Yourself from Potential Link Attacks

Have you recently broken the heart of a black hat link broker’s son or daughter? Stepped on a link farmer’s superhero cape? Talked smack about a nefarious panelist at an SEO conference not realizing they were just around the corner? The best defense, in this case, is a good defense (don’t go buying and renting links to others; you’re only enriching the spammers).

Many, many SEOs and webmasters worry a tremendous amount about spammy links pointing to their sites and pages. By and large, this isn’t a concern and it happens to every site on the web. Just look at some of the spamtastic links that point to SEOmoz (via this Yahoo! query):

Spammy Links to SEOmoz

If you see a collection of scraper sites filled with pharmaceutical, financial, legal, real estate and other questionable links with surprisingly well-optimized anchor text appearing in Google Alerts or your 24-hour reputation monitoring queries (e.g. http://www.google.com/search?as_q=seomoz&as_qdr=d&num=100 - which queries Google for all pages mentioning "seomoz" in the past 24 hours) don’t panic. If you exist on the web, you’re going to attract these types of links and the search engines will not punish you for it, even if you’re a relatively new, untrusted site.

However, if you start acquiring links that look an awful lot like they’re part of an intentional, paid link network (great anchor text, pointing to internal pages on the site, coming from footers and sidebars that contain other irrelevant, anchor-text rich links), there may be some cause for concern. Your best course of action is to submit a spam report to Google from your own, verified, Webmaster Tools account, noting that you have nothing to do with the links and want to make sure Google doesn’t think you’ve created, endorsed or paid for them.

This action is rarely necessary or worthwhile, but if you’re highly concerned about competitive conduct, it’s not a bad route to take. Of course, you’ll want to make sure you don’t actually engage in any black/gray hat activity yourself or it could trigger the wrong kind of review by a webspam team member.

Should I Buy Links to Push Down My Competitors?

Not unless you feel the link brokers of the world are more worthy than your favorite charity.

Seriously, the chances you’ll have a negative impact are far lower than the changes you’ll actually help (again, I refer back to our paid link WB Friday experiment in which the obvious link network had positive effects, even on the brand new site). The money is far better off spent on editorial content, public relations, social media campaigns and white hat SEO efforts for your own stite. Bringing someone else down may seem temporarily, emotionally satisfying, but it’s the wrong way to approach SEO (and life in general, if I may be so bold).

Looking forward to the discussion in the comments and happy to talk through the filtration processes and failsafes (or at least, my speculation) Google may employ.

p.s. The new Beginner’s Guide to SEO has more on understanding + recovering from search spam penalties.

Do you like this post? Yes No

Patience is an SEO Virtue »

Posted by Kate Morris

We have all been there once or twice, maybe a few more than that even. You just launched a site or a project,  and a few days pass, you login to analytics and webmaster tools to see how things are going. Nothing is there. 

WAIT. What?!?!?! 

Scenarios start running through your mind, and you check to make sure everything is working right. How could this be?

It doesn’t even have to be a new project. I’ve realized things on clients’ sites that needed fixing: XML sitemaps, link building efforts, title tag duplication, or even 404 redirection. The right changes are made, and a week later, nothing has changed in rankings or in webmaster consoles across the board. You are left thinking "what did I do wrong?"

funny pictures of dogs with captions

A few client sites, major sites mind you, have had issues recently like 404 redirection and toolbar PageRank drops. One even had to change a misplaced setting in Google Webmaster Tools pointing to the wrong version of their site (www vs non-www). We fixed it, and there was a drop in their homepage for their name.

That looks bad. Real bad. Especially to the higher ups. They want answers and the issue fixed now … yesterday really.

Most of these things are being measured for performance and some can even have a major impact on the bottom line. And it is so hard to tell them this, even harder to do, but the changes just take …

Patience

That homepage drop? They called on Friday, as of Saturday night things are back to normal. The drop happened for 2-3 days most likely, but this is a large site. Another client, smaller, had redesigned their entire site. We put all the correct 301 redirects for the old pages and launched the site. It took Google almost 4 weeks to completely remove the old pages from the index. There were edits to URLs that caused 404 errors, fixed within a day, took over a week to reflect in Google Webmaster Tools. 

These are just a few examples where changes were made immediately, but the actions had no immediate return. We live in a society that thrives on the present, immediate return. As search marketers, we make c-level executives happy with our ability to show immediate returns on our campaigns. But like the returns on SEO, the reflection of changes in SEO take time. 

The recent Mayday and Caffeine updates are sending many sites to the bottom of rankings because of the lack of original content. Many of them are doing everything "right" in terms of onsite SEO, but now that isn’t enough. The can change their site all they want to, but until there is relevant and good content plus traffic, those rankings are not going to return for long tail terms. 

There has also been a recent crack down on over optimized local search listings. I have seen a number of accounts suspended or just not ranking well because they are in effect trying too hard. There is a such thing as over optimizing a site, and too many changes at once can raise a flag with the search engines. 

One Month Rule

funny pictures of cats with captions

Here is my rule: Make a change, leave it, go do social media/link building, and come back  to the issue a month later. It may not take a month, but for smaller sites, 2 weeks is a good time to check on the status of a few things. A month is when things should start returning to normal if there have been no other large changes to the site. 

We say this all the time with PPC accounts. It’s like in statistical analysis, you have to have enough data to work with to see results. And when you are waiting for a massive search engine to make some changes, once they do take effect in the system, you then have to give it time to work. 

So remember the next time something seems to be not working in Webmaster Tools or SERPs:

  1. If you must, double check the code (although you’ve probably already done this 15 times) to ensure it’s set up correctly. But then,
  2. Stop. Breathe. There is always a logical explanation. (And yes, Google being slow is a logical one)
  3. When did you last change something to do with the issue?
  4. If it’s less than 2 weeks ago, give it some more time.
  5. Major changes, give it a month. (Think major site redesigns and URL restructuring)

Do you like this post? Yes No

Statistics a Win for SEO »

Posted by bhendrickson

We recently posted some correlation statistics on our blog. We believe these statistics are interesting and provide insight into the ways search engines work (a core principle of our mission here at SEOmoz). As we will continue to make similar statistics available, I’d like to discuss why correlations are interesting, refute the math behind recent criticisms, and reflect on how exciting it is to engage in mathematical discussions where critiques can be definitively rebutted.

I’ve been around SEOmoz for a little while now, but I don’t post a lot. So, as a quick reminder, I designed and built the prototype for the SEOmoz’s web index, as well as wrote a large portion of the back-end code for the project. We shipped the index with billions of pages nine months after I started on the prototype, and we have continued to improve it since. Recently I made the machine learning models that are used to make Page Authority and Domain Authority, and am working on some fairly exciting stuff that has not yet shipped. As I’m an engineer and not a regular blogger, I’ll ask for a bit of empathy for my post - it’s a bit technical, but I’ve tried to make it as accessible as possible.

Why does Correlation Matter?

Correlation helps us find causation by measuring how much variables change together. Correlation does not imply causation; variables can be changing together for reasons other than one affecting the other. However, if two variables are correlated and neither is affecting the other, we can conclude that there must be a third variable that is affecting both. This variable is known as a confounding variable. When we see correlations, we do learn that a cause exists — it might just be a confounding variable that we have yet to figure out.

How can we make use of correlation data? Let’s consider a non-SEO example.

There is evidence that women who occasionally drink alcohol during pregnancy give birth to smarter children with better social skills than women who abstain. The correlation is clear, but the causation is not. If it is causation between the variables, then light drinking will make the child smarter. If it is a confounding variable, light drinking could have no effect or even make the child slightly less intelligent (which is suggested by extrapolating the data that heavy drinking during pregnancy makes children considerably less intelligent).

Although these correlations are interesting, they are not black-and-white proof that behaviors need to change. One needs to consider which explanations are more plausible: the causal ones or the confounding variable ones. To keep the analogy simple, let’s suppose there were only two likely explanation - one causal and one confounding. The causal explanation is that alcohol makes a mother less stressed, which helps the unborn baby. The confounding variable explanation is that women with more relaxed personalities are more likely to drink during pregnancy and less likely to negatively impact their child’s intelligence with stress. Given this, I probably would be more likely to drink during pregnancy because of the correlation evidence, but there is an even bigger take-away: both likely explanations damn stress. So, because of the correlation evidence about drinking, I would work hard to avoid stressful circumstances. *

Was the analogy clear? I am suggesting that as SEOs we approach correlation statistics like pregnant women considering drinking - cautiously, but without too much stress.

* Even though I am a talented programmer and work in the SEO industry, do not take medical advice from me, and note that I construed the likely explanations for the sake of simplicity :-)

Some notes on data and methodology

We have two goals when selecting a methodology to analyze SERPs:

  1. Choose measurements that will communicate the most meaningful data
  2. Use techniques that can be easily understood and reproduced by others

These goals sometimes conflict, but we generally choose the most common method still consistent with our problem. Here is a quick rundown of the major options we had, and how we decided between them for our most recent results:

Machine Learning Models vs. Correlation Data: Machine learning can model and account for complex variable interactions. In the past, we have reported derivatives of our machine learning models. However, these results are difficult to create, they are difficult to understand, and they are difficult to verify. Instead we decided to compute simple correlation statistics.

Pearson’s Correlation vs. Spearman’s Correlation: The most common measure of correlation is Pearson’s Correlation, although it only measures linear correlation. This limitation is important: we have no reason to think interesting correlations to ranking will all be linear. Instead we choose to use Spearman’s correlation. Spearman’s correlation is still pretty common, and it does a reasonable job of measuring any monotonic correlation.

Here is a monotonic example: The count of how many of my coworkers have eaten lunch for the day is perfectly monotonically correlated with the time of day. It is not a straight line and so it isn’t linear correlation, but it is never decreasing, so it is monotonic correlation.

Here is a linear example: assuming I read at a constant rate, the amount of pages I can read is linearly correlated with the length of time I spend reading.

Mean Correlation Coefficient vs. Pooled Correlation Coefficient: We collected data for 11,000+ queries. For each query, we can measure the correlation of ranking position with a particular metric by computing a correlation coefficient. However, we don’t want to report 11,000+ correlation coefficients; we want to report a single number that reflects how correlated the data was across our dataset, and we want to show how statistically significant that number is. There are two techniques commonly used to do this:

  1. Compute the mean of the correlation coefficients. To show statistical significance, we can report the standard error of the mean.
  2. Pool the results from all SERPs and compute a global correlation coefficient. To show statistical significance, we can compute standard error through a technique known as bootstrapping.

The mean correlation coefficient and the pooled correlation coefficient would both be meaningful statistics to report. However, the bootstrapping needed to show the standard error of the pooled correlation coefficient is less common than using the standard error of the mean. So we went with #1.

Fisher Transform Vs No Fisher Transform: When averaging a set of correlation coefficients, instead of computing the mean of the correlation coefficients, sometimes one computes the mean of the fisher transforms of the coefficients (before applying the inverse fisher transform). This would not be appropriate for our problem because:

  1. It will likely fail. The Fisher transform includes a division by the coefficient minus one, and so explodes when an individual coefficient is near one and outright fails when there is a one. Because we are computing hundreds of thousands of coefficients each with small sample sizes to average over, it is quite likely the Fisher transform will fail for our problem. (Of course, we have a large sample of these coefficients to average over, so our end standard error is not large)
  2. It is unnecessary for two reasons. First, the advantage of the transform is that it can make the expect average closer to the expected coefficient. We do nothing that assumes this property. Second, as mean coefficients are near to zero, this property holds without the transform, and our coefficients were not large.

Rebuttals To Recent Criticisms

Two bloggers, Dr. E. Garcia and Ted Dzubia, have published criticisms of our statistics.

Eight months before his current post, Ted Dzubia wrote an enjoyable and jaunty post lamenting that criticism of SEO every six to eight months was an easy way to generate controversy, noting "it’s been a solid eight months, and somebody kicked the hornet’s nest. Is SEO good or evil? It’s good. It’s great. I <3 SEO." Furthermore, his twitter feed makes it clear he sometimes trolls for fun. To wit: "Mongrel 2 under the Affero GPL. TROLLED HARD," "Hacker News troll successful," and "mailing lists for different NoSQL servers are ripe for severe trolling." So it is likely we’ve fallen for trolling…

I am going to respond to both of their posts anyway because they have received a fair amount of attention, and because both posts seek to undermine the credibility of the wider SEO industry. SEOmoz works hard to raise the standards of the SEO industry, and protect it from unfair criticisms (like Garcia’s claim that "those conferences are full of speakers promoting a lot of non-sense and SEO myths/hearsays/own crappy ideas" or Dzubia’s claim that, besides our statistics, "everything else in the field is either anecdotal hocus-pocus or a decree from Matt Cutts"). We also plan to create more correlation studies (and more sophisticated analyses using my aforementioned ranking models) and thus want to ensure that those who are employing this research data can feel confident in the methodology employed.

Search engine marketing conferences, like SMX, OMS and SES, are essential to the vitality of our industry. They are an opportunity for new SEO consultants to learn, and for experienced SEOs to compare notes. It can be hard to argue against such subjective and unfair criticism of our industry, but we can definitively rebut their math.

To that end, here are rebuttals for the four major mathematical criticisms made by Dr. E. Garcia, and the two made by Dzubia.

1) Rebuttal to Claim That Mean Correlation Coefficients Are Uncomputable

For our charts, we compute a mean correlation coefficient. The claim is that such a value is impossible to compute.

Dr. E. Garcia : "Evidently Ben and Rand don’t understand statistics at all. Correlation coefficients are not additive. So you cannot compute a mean correlation coefficient, nor you can use such ‘average’ to compute a standard deviation of correlation coefficients."

There are two issues with this claim: a) peer reviewed papers frequently published mean correlation coefficients; b) additivity is relevant for determining if two different meanings of the word "average" will have the same value, not if the mean will be uncomputable. Let’s consider each issue in more detail.

a) Peer Reviewed Articles Frequently Compute A Mean Correlation Coefficient

E. Garcia is claiming something is uncomputable that researchers frequently compute and include in peer reviewed articles. Here are three significant papers where the researchers compute a mean correlation coefficient:

"The weighted mean correlation coefficient between fitness and genetic diversity for the 34 data sets was moderate, with a mean of 0.432 +/- 0.0577" (Macquare University - "Correlation between Fitness and Genetic Diversity", Reed, Franklin; Conversation Biology; 2003)

"We observed a progressive change of the mean correlation coefficient over a period of several months as a consequence of the exposure to a viscous force field during each session. The mean correlation coefficient computed during the force-field epochs progressively…" (MIT - F. Gandolfo, et al; "Cortical correlates of learning in monkeys adapting to a new dynamical environment," 2000)

"For the 100 pairs of MT neurons, the mean correlation coefficient was 0.12, a value significantly greater than zero" (Stanford - E Zohary, et al; "Correlated neuronal discharge rate and its implications for psychophysical performance", 1994)

SEOmoz is in a camp with reviewers from the journal Nature, as well as researchers from MIT, Stanford and authors of 2,400 other academic papers that use the mean correlation coefficient. Our camp is being attacked by Dr. E. Garcia’s, who argues our camp doesn’t "understand statistics at all." It is fine to take positions outside of the scientific mainstream, although when Dr. E. Garcia takes such a position he should offer more support for it. Given how commonly Dr. E. Garcia uses the pejorative "quack," I suspect he does not mean to take positions this far outside of academic consensus.

b) Additivity Relevant For Determining If Different Meanings Of "Average" Are The Same, Not If Mean Is Computable

Although "mean" is quite precise, "average" is less precise. By "average" one might intend the words "mean", "mode", "median," or something else. One of these other things that it could be used as meaning is ‘the value of a function on the union of the inputs’. This last definition of average might seem odd, but it is sometimes used. Consider if someone asked "a car travels 1 mile at 20mph, and 1 mile at 40mph, what was the average mph for the entire trip?" The answer they are looking for is not 30mph, which is mean of the two measurements, but ~26mph, which is the mph for the whole 2 mile trip. In this case, the mean of the measurements is different from the colloquial average which is the function for computing mph applied to the union of the inputs (the whole two miles).

This may be what has confused Dr. E. Garcia. Elsewhere he cites Statsweb when repeating this claim. Which makes the point that this other "average" is different than the mean. Additivity is useful in determining if these averages will be different. But even if another interpretation of average is valid for a problem, and even if that other average is different than the mean, it neither makes the mean uncomputable nor meaningless.

2) Rebuttal to Claim About Standard Error of the Mean vs Standard Error of a Correlation Coefficent

Although he has stated unequivocally that one cannot compute a mean correlation coefficient, Garcia is quite opinionated on how we ought to have computed standard error for it. To wit:

E. Garcia: "Evidently, you don’t know how to calculate the standard error of a correlation coefficient… the standard error of the mean and the standard error of a correlation coefficient are two different things. Moreover, the standard deviation of the mean is not used to calculate the standard error of a correlation coefficient or to compare correlation coefficients or their statistical significance."

He repeats this claim even after making the point above about mean correlation coefficients, so he clearly is aware the correlation coefficients being discussed are mean coefficients and not coefficients computed after pooling data points. So let’s be clear on exactly what his claim implies. We have some measured correlation coefficients, and we take the mean of these measured coefficients. The claim is that we should have used the same formula for standard error of the mean of these measured coefficients that we would have used for only one. Garcia’s claim is incorrect. One would use the formula for the standard error of the mean.

The formula for the mean, and for the standard error of the mean, apply even if there is a way to separately compute standard error for one of the observations the mean was over. If we were computing the mean of the count of apples in barrels, lifespans of people in the 19th century, or correlation coefficients for different SERPs, the same formula for the standard error of this mean applies. Even if we have other ways to measure the standard error of the measurements we are taking the mean over - for instance, our measure of lifespans might only be accurate to the day of death and so could be off by 24 hours - we cannot use how we would compute standard error for an observation to compute standard error of the mean of those observations.

A smaller but related objection is over language. He objects to my using the standard deviations in reference to a count of how far away a point is from a mean in units of the mean’s standard error. As wikipedia notes, the "standard error of the mean (i.e., of using the sample mean as a method of estimating the population mean) is the standard deviation of those sample means" So the count of how many lengths of standard error a number is away from the estimate of a mean, according to Wikipedia, would be standard deviations of our mean estimate. Beyond it being technically correct, it also fit the context, which was the accuracy of the sample mean.

3) Rebuttal to Claim That Non-Linearity Is Not A Valid Reason To Use Spearman’s Correlation

I wrote "Pearson’s correlation is only good at measuring linear correlation, and many of the values we are looking at are not. If something is well exponentially correlated (like link counts generally are), we don’t want to score them unfairly lower.”

E. Garcia responded by citing a source whom he cited as "exactly right": "Rand your (or Ben’s) reasoning for using Spearman correlation instead of Pearson is wrong. The difference between two correlations is not that one describes linear and the other exponential correlation, it is that they differ in the type of variables that they use. Both Spearman and Pearson are trying to find whether two variables correlate through a monotone function, the difference is that they treat different type of variables - Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data."

E. Garcia’s source, and by extension E. Garcia, are incorrect. A desire to measure non-linear correlation, such as exponential correlations, is a valid reason to use Spearman’s over Pearson’s. The point that "Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data" is true in that to compute Spearman’s correlation, one can convert continuous variables to ranked indices and then apply Pearson’s. However, the original variables do not need to originally be ranked indices. If they did, Spearman’s would always produce the same results as Pearson’s and there would be no purpose for it.

My point that E. Garcia objects to, that Pearson’s only measure’s linear correlation while Spearman’s can measure other kinds of correlation such as exponential correlations, was entirely correct. We can quickly quote Wikipedia to show that Spearman’s measures any monotonic correlation (including exponential) while Pearson’s only measures linear correlation.

The Wikipedia article on Pearson’s Correlation starts by noting that it is a "measure of the correlation (linear dependence) between two variables".

The Wikpedia article on Spearman’s Correlation starts with an example in the upper right showing that a "Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation."

E. Garcia’s position neither makes sense nor agrees with the literature. I would go into the math in more detail, or quote more authoritative sources, but I’m pretty sure Garcia now knows he is wrong. After E. Garcia made his incorrect claim about the difference between Spearman’s correlation and Pearson’s correlation, and after I corrected E. Garcia’s source (which was in a comment on our blog), E. Garcia has stated the difference between Spearman’s and Pearson’s correctly. However, we want to make sure there’s a good record of the points, and explain the what and why.

4) Rebuttal To Claim That PCA Is Not A Linear Method

This example is particularly interesting because it is about Principle Component Analysis(PCA), which is related to PageRank (something many SEOs are familiar with). In PCA one finds principal components, which are eigenvectors. PageRank is also an eigenvector. But I am digressing, let’s discuss Garcia’s claim.

After Dr. E. Garcia criticized a third party for using Pearson’s Correlation because Pearson’s only shows linear correlations, he criticized us for not using PCA. Like Pearson’s, PCA can only find linear correlations, so I pointed out his contradiction:

Ben: "Given the top of your post criticizes someone else for using Pearson’s because of linearity issues, isn’t it kinda odd to suggest another linear method?"

To which E. Garcia has respond: "Ben’s comments about… PCA confirms an incorrect knowledge about statistics" and "Be careful when you, Ben and Rand, talk about linearity in connection with PCA as no assumption needs to be made in PCA about the distribution of the original data. I doubt you guys know about PCA…The linearity assumption is with the basis vectors."

But before we get to the core of the disagreement, let me point out that E. Garcia is close to correct with his actual statement. PCA defines basis vectors such that they are linearly de-correlated, so it does not need to assume that they will be. But this a minor quibble.  This issue with Dr. E. Garcia’s his position is the implication that the linear aspect of PCA is not in the correlations it finds in the source data like I claimed, but only in the basis vectors.

So, there is the disagreement - analogous to how Pearson’s Correlation only finds linear correlations, does PCA also only find linear correlations? Dr. E. Garcia says no. SEOmoz, and many academic publications, say yes. For instance:

"PCA does not take into account nonlinear correlations among the features" ("Kernel PCA for HMM-Based Cursive Handwriting Recognition"; Andreas Fischer and Horst Bunke 2009)

"PCA identifies only linear correlations between variables" ("Nonlinear Principal Component Analysis Using Autoassociative Neural Networks"; Mark A. Kramer (MIT), AIChE Journal 1991)

However, besides citing authorities, let’s consider why his claim is incorrect. As E. Garcia imprecisely notes, the basis vectors are linearily de-correlated. As the sources he cites points out, PCA tries to represent the source data as linear combinations of these basis vectors. This is how PCA shows us correlations - by creating basis vectors that can be linearly combined to get close to the original data. We can then look at these basis vectors and see how aspects of our source data vary together, but because it only is combining them linearly, it is only showing us linear correlations. Therefore, PCA is used to provide an insight into linear correlations — even for non-linear data.

5) Rebuttal To Claim About Small Correlations Not Being Published

Ted Dzubia suggests that small correlations are not interesting, or at least are not interesting because our dataset is too small. He writes:

Dzubia: "out of all the factors they measured ranking correlation for, nothing was correlated above .35. In most science, correlations this low are not even worth publishing. "

Academic papers frequently publish correlations of this size. On the first page of a google scholar search for "mean correlation coefficient" I see:

  1. The Stanford neurology paper I cited above to refute Garcia is reporting a mean correlation coefficient of 0.12.
  2. "Meta-analysis of the relationship between congruence and well-being measures"  a paper with over 200 citations whose abstract cites coefficients of 0.06, 0.15, 0.21, and 0.31.
  3. "Do amphibians follow Bergmann’s rule" which notes that "grand mean correlation coefficient is significantly positive (+0.31)."

These papers were not cherry picked from a large number of papers. Contrary to Ted Dzubia’s suggestion, the size of a correlation that is interesting varies considerably with the problem. For our problem, looking at correlations in Google results, one would not expect any single high correlation value from features we were looking at unless one believes Google has a single factor they predominately use to rank results with and one is only interested in that factor. We do not believe that. Google has stated on many occasions that they employ more than 200 features in their ranking algorithm. In our opinion, this makes correlations in the 0.1 - 0.35 range quite interesting.

6) Rebuttal To Claim That Small Correlations Need A Bigger Sample Size

Dzubia: "Also notice that the most negative correlation metric they found was -.18…. Such a small correlation on such a small data set, again, is not even worth publishing."

Our dataset was over 100,000 results across over 11,000 queries, which is much more than sufficient for the size of correlations we found. The risk when having small correlations and a small dataset is that it may be hard to tell if correlations are statistical noise. Generally 1.96 standard deviations is required to consider results statistically significant. For the particular correlation Dzubia brings up, one can see from the standard error value that we have 52 standard deviations of confidence the correlation is statistically significant. 52 is substantially more than the 1.96 that is generally considered necessary.

We use a sample size so much larger than usual because we wanted to make sure the relative differences between correlation coefficients were not misleading. Although we feel this adds value to our results, it is beyond what is generally considered necessary to publish correlation results.

Conclusions

Some folks inside the SEO community have had disagreements about our interpretations and opinions regarding what the data means (and where/whether confounding variables exist to explain some points). As Rand carefully noted in our post on correlation data and his presentation, we certainly want to encourage this. Our opinions about where/why the data exists are just that - opinions - and shouldn’t be ascribed any value beyond its use in applying to your own thinking about the data sources. Our goal was to collect data and publish it so that our peers in the industry could review and interpret.

It is also healthy to have a vigorous debate about how statistics such as these are best computed, and how we can ensure accuracy of reported results. As our community is just starting to compute these statistics (Sean Weigold Ferguson, for example, recently submitted a post on PageRank using very similar methodologies), it is only natural there will be some bumbling back and forth as we develop industry best practices. This is healthy and to our industry’s advantage that it occur.

The SEO community is the target of a lot of ad hominem attacks which try to associate all SEOs with the behavior of the worst. Although we can answer such attacks by pointing out great SEOs and great conferences, it is exciting that we’ve been able to elevate some attacks to include mathematical points, because when they are arguing math they can be definitively rebutted. On the six points of mathematical disagreement, the tally is pretty clear - SEO community: Six, SEO bashers: zero. Being SEOs doesn’t make us infallible, so surely in the future the tally will not be so lopsided, but our tally today reflects how seriously we take our work and how we as a community can feel good about using data from this type of research to learn more about the operations of search engines.

Do you like this post? Yes No

Whiteboard Friday - We Bought Links and It Worked!! »

Posted by Scott Willoughby

WARNING! This week’s video is pure evil! If you are faint of heart, easily disturbed, care for small children, terrified of slugs, curious about magnets, or fond of licorice, TURN BACK NOW! 

Don't Touch It! It's Evil!

This video provides actual evidence that the diabolical practice of buying links can actually work (and astoundingly well). It also says the practice can get you penalized back to the stone age, but hey, who needs to talk sense; there’s controversy to be courted! So, without further ado (or any more exclamation points), let the heresy commence…

 

Did you avoid the temptation? Did you refuse to watch? Is the curiosity killing you? Okay, okay, I’ll give you the lowdown, but you have to promise you’ll nevereverever use this information for evil. Keep that halo sparkly, champ!

Here’s the deal: Rand snuck out without telling any of us and bought some illicit paid links. They were anchor text optimized links from the same page on the same site to minimize the confounding factors.  He got one link to each of three different sites…

Experiment 1

  • Bought a link for a three word phrase with a Keyword Difficulty Score of 30%
  • Directed it at an SEOmoz blog post with the term in the body, but not in the title tag
  • Ranking before link purchase: #458
  • Ranking after link purchase: #30
  • Time elapsed: 8 days (all links were pulled as soon as changes were observed)

Experiment 2

  • Bought a link for two word phrase with a Keyword Difficulty Score of 36%
  • Directed at page on an established, but low-authority domain with term at end of the title tag
  • Ranking before link purchase: #426
  • Ranking after link purchase: #58
  • Time elapsed: 4 days

Experiment 3

  • Bought a link for a three word term with a Keyword Difficulty Score of 26%
  • Directed at a page on a brand new site with less than 10 total links
  • Ranking before link purchase: #198
  • Ranking after link purchase: #4
  • Time elapsed: 4 days

Holy crap, right?! That’s some serious movin’ and shakin’ out of one little link! Here are a few things to note before we discuss why you shouldn’t go smash open your piggy bank and spend your shiny coins on nefarious links: 1) As soon as the links were pulled, the rankings fell back down to where they were before the links, so if you’re renting, don’t get too comfy in that high position; 2) These were very short-term so there wasn’t much time allowed for Google to sniff these links out; 3) This is not a statistically significant sample size or a scientific test, take these results as anecdotal.

Okay then, why shouldn’t you buy links if they work such splendid voodoo on your rankings? Let’s fight anecdotal "proof" with an anecdotal warning. Some friends of SEOmoz who run a fairly well-established site recently ran into a snag–they vanished from Google.  They had ranked in the top two for many moons, raking in the lucrative spoils of their hard-won rankings. Then they got greedy; they thought a couple of paid links (four to be exact) could secure them the number one spot for all eternity. They wanted to be like the lone Highlander atop his mountain. They bought their links, and it worked for a minute. Then Google beheaded them (to continue the Highlander theme) by abso-friggin-lutely burying their site. Their links were discovered and now they can’t even rank for their business name or their full title tags. Suffice to say, this has made business a tad difficult.

Listen, my fellow marketers, to this cautionary tale of penalty and woe. Paid links may reap quick and easy reward, but the repercussions can be dreadful. Besides, everyone knows that the Krampus comes for SEOs who pay for links.

Big thanks to Avi Wilensky of PRO Media Corp for collaboration with us on this study.

 

And now, a very special message…

This week’s episode of Whiteboard Friday is a bittersweet installment for me. After producing this blog feature for over three years, and more than 150 episodes, this is my last.  As Rand mentioned in the video, I’ve decided to bid farewell to the magical world of SEOmoz and pursue my next great adventure.  I’m still weighing opportunities and haven’t decided where I’ll be heading next, but you can rest assured I’ll still be playing in the online marketing sandbox, so bring your shovel and we can build a castle together. It’ll be sweet; we can have towers and a moat…maybe a dragon.  If you’d like to keep in touch, I’m easy to find on Facebook, LinkedIn, and Twitter.

I want to thank everyone in the community for contributing to the truly wonderful experience I’ve had here, and all of the amazing people I’ve had the pleasure to meet online and off.  I hope you’ve all enjoyed watching these videos and reading my posts as much as I’ve enjoyed making them. Most sincere thanks and gratitude to you all for an awesome experience over the last several years. Have fun and I’ll see you around the interwebz!

Best,

Scott

Do you like this post? Yes No

Matt Cutts Movie Marathon »

Posted by Dr. Pete

Dancing Movie FoodThis post is the culmination of two of my lifelong dreams: (1) To spend an entire day on YouTube and call it "work", and (2) To Photoshop Matt Cutts’ face on cartoon food. Early in 2009, Matt Cutts, Google’s most visible anti-spam engineer, began releasing a series of short Webmaster Help videos. You’ve probably seen some of these videos, but what you may not know is that there are currently over 200 of them, with more than 70 posted in 2010 alone.

From time to time, I’ve been amazed at the details that slip out during these videos, many of which don’t get much play in the blogosphere. So, I decided to watch all of the 2010 videos and report back on what I learned. This post contains my Top 10 picks along with a few interesting tidbits and one SHOCKING CONSPIRACY.

Obligatory Disclaimers

Let’s get this out of the way, as Matt seems to be a lightning rod for controversy. I’m a nice guy, but if you don’t read this section, don’t expect me to reply to your comments.

I don’t speak for Matt

Other than having played a couple of hands of Search Spam with Matt over the years (I think we’re 1-and-1), I don’t know him and I’m not trying to put words in his mouth. I’ve used the original video titles, for reference, but the rest is paraphrased. I strongly encourage you to watch the originals.

Don’t believe everything you hear

Matt, like everyone, has vested interests, and Google doesn’t have any motivation to tell us every detail about how the algorithm works.

Don’t disbelieve everything, either

I don’t think Matt stays up nights scheming about how to deceive SEOs. I think he’s a smart, decent guy who cares about search quality.

My Top 10 Picks

One quick note, before I reveal my picks (counting down from 10 to 1). If you want to get Matt to answer your questions, it apparently helps to have a cool-sounding name, like "Magico" or "Youser". From now on, I will have my Muppet Intern Yoozer submit all of my help questions.

10. Should I spend time on meta keywords tags? (Apr 19)

Matt says: "I wouldn’t spend even 0 minutes on it, personally".

I know most of you know this, but it’s good to hear it from the source. Google does not use the keywords meta tag for ranking. Meta description still has value for other reasons (Watch the video - 1:21).

9. How does URL structure affect PageRank (Apr 6)

Matt says: "Google doesn’t worry so much about how deep a set of directories is."

This post raises an important distinction – URL structure is not link structure. We get this confusion frequently in Q&A. Let’s say you have a URL like this:

http://www.example.com/year/month/day/topic/blog-post-title

That page isn’t 5 levels deep, just because it’s 5 /s behind the root domain in the URL. The depth of the page is determined by your internal architecture and link structure. URL length may affect the power of keywords in the URL and the click-through of the URL, but the crawlers don’t really care when it comes to finding your pages. What matters is if this page is one hop from the home-page or 10 hops away (Watch the video - 2:04).

Note: SEOmoz correlation data has shown that deeper folder structure may correlate with worse rankings. Deep folder structures can be an indication of other issues, including information architecture problems.

8. Can I make sure Google always uses my meta description tags? (Mar 24)

Matt says: "The short answer is ‘no’."

I hear this complaint a lot. Google will sometimes rewrite its own snippets for relevance. You can block the ODP and you can write relevant, unique meta descriptions, but you can’t completely control what Google does (Watch the video - 1:52).

7. Can having dofollow comments on my blog affect its reputation? (Feb 22)

This is an interesting two-parter. First off, outbound links to spammy sites can have a negative impact on your reputation. Manage your outbound links and nofollow if you have to. Individual, inbound spammy links will typically not harm you, on the other hand, because they’re beyond your control (although, in my experience, a pattern of inbound spammy links can be a different story). Matt has some great comments at the end about the value of commenting on dofollow blogs (Watch the video - 2:35).

6. Is cross-linking websites bad? (Jan 25)

Matt says: "I would ask yourself: are these websites really related in any kind of sense?"

When Matt wants to read cartoons, links to auto insurance and coffee tables make him sad. Cross-linking 3 sites probably isn’t a big deal, but 30 or 300 could likely get you into trouble. Relevance is the key, and footer cross-links are often low-value (Watch the video - 2:00).

5. How can I get Google to index more of my Sitemap URLS? (Mar 23)

Matt says: "I wouldn’t get hung up on just how many pages have been indexed…"

We hear this one from frustrated webmasters every day. Google does not guarantee that pages in your XML sitemap will be indexed. Indexation has a lot to do with your authority and trust – an authoritative site will get more love from the crawlers, plain and simple (Watch the video - 1:31). Check out Rand’s recent post diving deeper into Matt’s comments on the indexation cap.

4. Will changing hosts cause any SEO concerns? (Feb 9)

Matt says: "Most people can switch their IP address and never have any issue whatsoever."

This is a common fear that is usually unfounded. As long as your domain name and hosting country stay the same, switching from one reliable host to another should have no SEO impact. Matt gives a nice briefing on how to change DNS servers and set your TTL that’s worth watching (Watch the video - 1:53).

Note: Although I implied this in the recap, it deserves repeating. If you’re changing your domain name and/or hosting country, that can definitely affect your ranking and is a much more complex issue. Consider the risks and plan accordingly, in those cases.

3. Is Google Analytics data a factor in a page’s ranking? (Feb 2)

Matt says: "I promise you, my team will never ask the analytics team to use their data."

I don’t think you’ll hear a more direct answer from Matt than that. Conspiracy theories abound, but there are 3 separate videos in 2010 where Matt states that the quality team does not use Google Analytics data. Of course, that doesn’t mean that user metrics (click-through rate, etc.) aren’t a factor, but these are more likely coming from other sources, such as SERP tracking (Watch the video - 1:17).

2. Can you give us an update on rankings for long-tail searches? (May 30)

This is a discussion of the so-called "Mayday" update. Matt clearly states that Mayday is a deliberate, algorithmic change to improve the quality of long-tail searches, and it is not temporary. It is not related to Caffeine, although the roll-out timeline overlaps somewhat (Watch the video - 2:39).

1. Should I be obsessing about load times? (May 5)

Matt says: "We have considered in 2010 using page speed…"

There are a couple of important points here. First, Google hadn’t even finalized the decision to use page speed as a ranking factor until this spring*. Second, page speed is just one of over 200 ranking factors. All else being equal, a fast site is good for users and good for search, but an occasional server glitch isn’t going to kill your rankings. If you can speed up your site with a few simple changes, though, why not do it (Watch the video - 2:28)?

*Edit: As Lindsay points out below, Matt’s April 9th blog post does suggest that page speed was incorporated as a ranking factor. One of the issues with the dates on the videos is that they’re often recorded a bit before they’re released. On the May 5th video, Matt suggests that Google hadn’t made a final decision on using page speed, but the reality is that that decision was probably made in March or April.

Honorable Mentions

3. How many bots does Google have? (Feb 30)

This is a nice review of what bots/spiders actually are. They aren’t real robots that come knocking on your door. It’s a good, short primer for new SEOs (Watch the video - 1:30).

2. State of the Index 2009 (Jan 20)

This is a long one, and it’s slightly out of date, but it’s a good review of some of what happened in 2009. It has a solid explanation of rel=canonical, as well as the parameter blocking and fetch as Googlebot features in Webmaster Tools. It ends with a brief explanation of what Caffeine is all about (Watch the video - 25:59).

1. How many search algorithm changes were made in 2009? (Apr 22)

Google makes a change to the algorithm on the order of ONCE PER DAY. These changes may be batched and rolled out in chunks, but another video confirmed a number of roughly 400 algorithm changes in 2009. If you think May-Day and Caffeine are the only things that have happened in 2010, think again. Google is constantly evolving. This video also includes a statement you don’t hear from Matt every day – Good content is necessary, but not sufficient (Watch the video - 1:53).

The Shocking Conspiracy

Of course, it wouldn’t be a post about Matt Cutts without a conspiracy. If you watch the 2010 videos, you’ll see a shocking transformation, where Matt goes from having hair to no hair back to hair again almost instantaneously. I’ve graphed this phenomenon below:

Graph of Matts Hair

Matt claims this has something to do with the timing of the videos and filming them in batches, blah blah blah, but those of us who are savvy are forced to reach one of two conclusions:

  1. Google has discovered the secret of re-growing hair and refuses to share it.
  2. Matt is, as I’ve often suspected, a cybernetic extension of the Google algorithm.

So, there you have it. My Top 10 picks of 2010 (so far), a few highlight reels, and one shocking conspiracy, as promised. By the way, if you’re a beginner or are interested in general SEO tips like these, make sure to check out our completely revised, free Beginner’s Guide to SEO.

Do you like this post? Yes No

Must-Have SEO Recommendations: Step 7 of the 8-Step SEO Strategy »

Posted by laura

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

You know the client.  The one that really needs your help.  The one that gets pumped when you explain how keywords work.  The one that has an image file for a site.  Or maybe the one that insists that if they copy their competitor’s title tags word-for-word, they’ll do better in search results (I had a product manager make his team do that once. Needless to say (I was thrilled when) it didn’t work). 

In Step 6 of the SEO Strategy document I noted that this strategy document we’ve been building isn’t a best practices document, and it’s more than a typical SEO audit.  It is a custom set of specific, often product-focused recommendations and strategies for gaining search traffic.  For that reason I recommended linking out to SEO basics and best practices elsewhere (in an intranet or a separate set of documents).

But most of the time you’ll still need to call out some horizontal things that this client must have put in front of their faces, or else it will be missed completely.  SEO/M is your area of expertise, not theirs, so help them make sure they’ve got their bases covered. You can create an additional section for these call-outs, wherever you feel it is appropriate in your document.

WHAT CAN I INCLUDE HERE?

Here are some examples of things you could include if you felt your client needed this brought to their attention:

  1. Press Release optimization and strategy
  2. SEO resources for specific groups in the company:
    1. SEO for business development (linking strategies in partner deals)
    2. SEO for writers/editorial
    3. SEO for designers
  3. SEO for long term results rather than short term fixes
  4. International rollout recommendations
  5. Content management system – how it is impairing their SEO
  6. Risks and avoidances
  7. Anything that you feel should be covered in more detail for this particular client, that wasn’t covered in your strategy in the last step. This is a catchall – a place to make sure you cover all bases.
  8. Nothing - if you dont feel it’s needed.

If the client really needs a lot of help, you’d want to provide training and best practices, either as separate deliverables along with the strategy document, or better yet – work on training and best practices with them first, then dive into more specific strategy. You don’t want to end up with a 15 page (or even 4 page for that matter) best practices document in your strategy doc. Remember, we’re beyond best practices here, unless, in this case there’s something specific that needs to be called out.  

If the client needs more than one thing called out, do it.  If it’s several things, consider either adding an appendix, or as I mentioned, creating a separate best practices document.

The reason I recommend best practices as a separate document is because it is really a different project, often for an earlier phase.

EXAMPLE 1:

Let’s say for example, my client has the type of content the press loves to pick up. They don’t do press releases, mostly because they don’t know how exactly to write them and where to publish them, but they want to.  I‘ll add a Press Releases section after the strategy and I might give them these simple tidbits:

  • High level benefit of doing press releases
  • What person or group in the company might be best utilized to manage press releases
  • Examples of what to write press releases about
  • Channels they can publish press releases to
  • Optimization tips
  • References they can go to for more detailed information

EXAMPLE 2:

My client gets it. They’re pretty good at taking on most SEO on their own. This strategy document I’m doing for them is to really dig in and make sure all gaps are closed, and that they’re taking advantage of every opportunity they should.  Additionally, in a few months they are going to roll out the site to several international regions. 

My dig into the site and its competitors (and search engines) for this strategy have all been for the current site in this country. Because the Intl rollout hasn’t started yet, I will add a section to my document with specific things they need to keep in mind when doing this rollout.

  • Localized keyword research (rather than using translate tools)
  • ccTLD  (country code top level domain) considerations
  • Tagging considerations (like “lang”)
  • Proper use of Google Webmaster Tools for specifying region
  • Potential duplication issues
  • Maybe even a lit of popular search engines in those countries
  • Point to more resources or list as a potential future contract project

Make sense?  Use your judgment here. Like we’ve seen in the rest of the steps, this strategy document is your work of art, so paint it how your own creative noggin sees it, Picasso.

Other suggestions for what you might include here? Love it? Hate it? Think this step stinks or mad I didn’t include music to listen to for this one? Let’s hear about it in the comments!

Do you like this post? Yes No

Linkscape Index Update: New Partnerships & API Data »

Posted by randfish

Many of our keen members observed that late last week, Linkscape’s index updated (this is actually our 27th index update since starting the project in 2008). This means new link data in Open Site Explorer and Linkscape Classic, as well as new metric data via the mozbar and in our API.

Index 27 Statistics

For those who are interested, you can follow the Linkscape index update calendar on our API Wiki (as you can see, this update was about a week early).

Although we’ve now crawled many hundreds of billions of pages since launch, we only serve our uber-freshest index. Historical data is something we want to do soon - more on that later. This latest index’s stats feature:

  • Pages - 40,152,060,523
  • Subdomains - 284,336,725
  • Root Domains - 91,539,345
  • Links - 420,049,105,986
  • % of Nofollowed Links - 2.02%
  • % of Nofollows on Internal Links - 58.7%
  • % of Nofollows on External Links - 41.3%
  • % of Pages w/ Rel Canonical - 4.3%

These numbers continue the trend we’ve been seeing for some time where internal nofollow usage is declining slightly while rel canonical is down a bit in this index but up substantially over the start of the year (this likely has more to do with our crawl selection than with sites actually removing canonical URL tags.

Comparing Metrics from Index to Index

One of the biggest requests we get is the ability to track historical information about your metrics from Linkscape. We know this is really important to everyone and we want to make this happen soon, but have some technical and practical challenges to overcome. The biggest of which is that what we crawl changes substantively with each index, both due to our improvements in what to crawl (and what to ignore) and with the web’s massive changes each month (60%+ of pages we fetched 6 months ago are no longer in existence!).

For now, the best advice I can give is to measure yourself against competitors and colleagues rather than against your metrics last month or last year. If you’re improving against the competition, chances are good that your overall footprint is increasing at a higher rate than theirs. You might even "lose" links in a raw count from the index, but actually have improved simply because a few hundred spam/scraper websites weren’t crawled this time around, or we’ve done better canonicalization with URLs than last round or your link rotated out of the top of a popular RSS feed many sites were reproducing.

OpenSiteExplorer Comparison Report
Measuring against other sites in your niche is a great way to compare from index to index

If you’ve got more questions about comparisons and index modifications over time, feel free to ask in the comments and we’ll try to dive in. For those who are interested, our current thinking around providing historical tracking is to give multiple number sets like - # of links from mR 3+ pages, # of links from mR 1-3 pages, etc. to help show how many "important" links you’re gaining/losing - these fluctuate much less from index to index and may be better benchmarking tools.

Integration with Conductor’s Searchlight Software

SEOmoz is proud to be powering Conductor’s new Searchlight software. I got to take a demo of their toolset 2 weeks ago (anyone can request one here) and was very impressed. See for yourself with a few exclusive screenshots I’ve wrangled up:

Searchlight Screenshot 1/4

Searchlight Screenshot 2/4

Searchlight Screenshot 3/4

Searchlight Screenshot 4/4

Conductor's Seth Besmertnik at the Searchlight Launch Event

And at the bottom of the series is Seth Besmertnik, Conductor’s CEO, during the launch event (note the unbuttoned top button of his shirt with the tie; this indicates Seth is a professional, but he’s still a startup guy at heart). Searchlight already has some impressive customers including Monster.com, Care.com, Siemens, Travelocity, Progressive and more. I think many in the SEO field will agree that moving further into software is a smart move for the Conductor team, and the toolset certainly looks promising.

Conductor’s also releasing some cool free research data on seasonality (request form here). Couldn’t resist sharing a screenshot below of the sample Excel workbook they developed:

Keyword Seasonality Excel Workbook from Conductor

mmm… prepopulated

SEOmoz’s Linkscape index currently powers the link data section of Searchlight via our API and we’re looking forward to helping many other providers of search software in the future. We’re also integrated with Hubspot’s Grader.com and EightFoldLogic’s (formerly Enquisite) Linker, so if you’re seeking to build an app and need link data, you can sign up for free API access and get in touch if/when you need more data.

The Link Juice App for iPhone

We’re also very excited about the popular and growing iPhone app - LinkJuice. They’ve just recently updated the software with a few recommendations straight from Danny Dover and me!

LinkJuiceApp 2/2LinkJuice App 1/2

The LinkJuice folks have promised an Android version is on its way soon, and since that’s my phone of choice, I can’t wait!

If you’ve got an app, software piece or website that’s powered by Linkscape, please do drop us a line so we can include it. I’ve been excited to see folks using it for research - like Sean’s recent YOUmoz post on PageRank correlations - as well as in many less public research works.

Oh, and if you somehow missed the announcement, go check out the new Beginner’s Guide to SEO! It’s totally free and Danny’s done a great job with it.

Do you like this post? Yes No

Introducing the Beginner’s Guide to SEO v2.0 »

Posted by Danny Dover

Update: I am happy to report that the PDF printer difficulties have been fixed and the guide is now "printier" than ever ;-p You can download the PDF here.


 Today, I am proud to announce the new and improved Beginner’s Guide to SEO. This free tutorial covers everything you need to know to get started improving your search engine rankings in the major search engines. Put simply, this is the resource I would have kicked a fool in order to get my hands on when I was first diving into the wild world of SEO.

Roger mozBot
Start Learning SEO Today!

 

The New Beginner’s Guide to SEO


 

Free


The Beginner’s Guide to SEO won’t cost you a dime. It is free to read, download and otherwise devour. (Be careful about paper cuts! E-paper cuts are the worst kind of paper cuts.)

 

Comprehensive


This guide is the result of hundreds of hours of research and includes chapters on all of the following topics:

  1. How Search Engines Operate
  2. How People Interact With Search Engines
  3. Why Search Engine Marketing is Necessary
  4. The Basics of Search Engine Friendly Design & Development
  5. Keyword Research
  6. How Usability, Experience, & Content Affect Rankings
  7. Growing Popularity and Links
  8. Search Engine’s Tools for Webmasters Intro
  9. Myths & Misconceptions About Search Engines
  10. Measuring and Tracking Success

 

Awesome


Moz scientists have measured spikes in the awesomeness level of the Beginners Guide to SEO that have been equivalent to:

  • Getting lick attacked by seven puppies
  • The feeling of putting on new socks
  • Being in a hot tub when it is snowing
  • Finding a Popsicle in the freezer
  • The chocolate at the bottom of Drumstick ice cream cones
  • An unicorn flying over a pack of exploding ninjas with lazer dinosaurs with beer for eyes
  • Finding 10 dollars in your pocket
  • Bowties on scientists
  • David Bowie

I look forward to hearing your feedback about the new guide in the comments. Your input in these comments will serve as the official feedback form for this guide. I will use your recommendations to help improve and update the guide over time.

Start Learning SEO Today!

Do you like this post? Yes No

The Death and Rebirth of Editorial Citation on the Web »

Posted by randfish

I’ve been having a similar conversation with a number of folks from the world of search that’s interesting enough as to deserve some transparency and discussion. It centers around the idea of the web’s link graph and how it operates to power the rankings of relevant results in the major search engines. If we follow this brief timeline, you’ll see what I’m getting at:

  • 1993 - 2000: The beginning of the web is marked by an influx of researchers, academics, hobbyists and enthusiasts. Nearly every link created has an editorial, reference purpose behind it. A link is one page telling its viewers that another page has useful, interesting or worthwhile information about a specific topic.
  • 2001 - 2005: As the web commercializes at an accelerated pace and PageRank becomes a familiar concept, links drift further away from editorial votes and more towards self-interested endorsements, often with financial motivations.
  • 2006 - 2010: The web’s link graph swings further away from editorial references towards ever-more commercial interests. Meanwhile, the social web rises with the popularity of sites like StumbleUpon, Digg, Reddit, Facebook, Twitter & LinkedIn. These communities often contain a much higher percentage of editorial citations, particularly those that contain smaller communities inside them (LinkedIn groups, pockets of Twitter users and Facebook friends)

During chats with some folks from Bing, Google & the SEO world, it became clear that nearly everyone is aware of this ecosystem and thinking more about how to leverage it to make search better. Bing & Google obviously made back-to-back deals to get the Twitter firehose late last year. Google’s been trying hard to get Facebook data without success (and Bing may have it, thanks to their investment in Facebook in 2007). Both engines could certain extract citation data from other web communities that publicly publish (Delicious, Reddit, DiggLinkedInStumbleUpon, StackOverflow and as of today, Quora) and extrapolate reference material.

The problem for the engines is that links on websites have a high probability (probably not 50%, but maybe as high as 20%) of existing specifically to influence their rankings. While some of those influence-targeted links certainly do point to great content that’s relevant and high quality, the engines would prefer to return to a web of "pure" recommendations. The social web might offer more of that type of web environment. Sure, we all tweet/share/post links to our own websites, but those are easy for engines to detect and treat as "internal" references. The "external" endorsements, however, are often much more genuine than what exists on the open web’s link graph.

If you’re in the field of SEO, I think this means social media marketing is a no brainer. And if people aren’t recommending and endorsing your site editorially in their Twitter feeds, Facebook updates, LinkedIn groups, answers on Q+A sites, and when socially bookmarking, tagging and voting, I’d be thinking hard about how to change that.

p.s. I still think the social graph overall is still a very small portion of the engines’ ranking algorithms, but I think Bing & Google are both racing towards innovation on this front as fast as they can. SEOs should, IMO, follow suit.

Do you like this post? Yes No

Amazon Web Services: Clouded by Duplicate Content »

Posted by Stephen Tallamy

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

At the end of last year the website I work on, LocateTV, moved into the cloud with Amazon Web Services (AWS) to take advantage of increase flexibility and reduced running costs. A while after we switched I found that Googlebot was crawling the site almost twice as much as it used to. Looking into it some more I found that Google had been crawling the site from a subdomain of amazonaws.com.

The problem is, when you start up a server on AWS it automatically gets a public DNS entry which looks a bit like ec2-123.456.789.012.compute-1.amazonaws.com. This means that the server will be available through this domain as well as the main domain that you will have registered to the same IP address. For us, this problem doubled itself as we have two web servers for our main domain and hence the whole of the site was being crawled through two different amazonaws.com subdomains and www.locatetv.com.

Now there were no external links to these AWS subdomains but, being a domain registrar, Google was notified of the new DNS entries and went ahead and indexed loads of pages. All this was creating extra load on our servers and a huge duplicate content problem (which I cleaned up, after quite a bit of trouble - more below).

A pretty big mess.

I thought I’d do some analysis into how many other sites were being affected by this problem. A quick search on Google for site:compute-1.amazonaws.com and site:compute.amazonaws.com reveals almost 1/2 million web pages indexed (often dodgy stats with this command but it gives some scale of the issue):

site:compute-1.amazonaws.com

My guess is that most of these pages are duplicate content with the site owners having separate DNS entries for their site. Certainly this is the case for the first few sites I checked:

  • http://ec2-67-202-8-9.compute-1.amazonaws.com is the same as http://www.broadjam.com
  • http://ec2-174-129-207-154.compute-1.amazonaws.com is the same as http://www.elephantdrive.com
  • http://ec2-174-129-253-143.compute-1.amazonaws.com is the same as http://boxofficemojo.com
  • http://ec2-174-129-197-200.compute-1.amazonaws.com is the same as http://www.promotofan.com
  • http://ec2-184-73-226-122.compute-1.amazonaws.com is the same as http://www.adbase.com

For Box Office Mojo, Google is reporting 76,500 pages indexed for the amazonaws.com address. That’s a lot of duplicate content in the index. A quick search for something specific like "Fastest Movies to Hit $500 Million at the Box Office" shows duplicates from both domains (plus a secure subdomain and the IP address of one of their servers - oops!):

Fastest Movies to Hit $500 Million at the Box Office

Whilst I imagine Google would be doing a reasonable job of filtering out the duplicates when it comes to most keywords, it’s still pretty bad to have all this duplicate content in the index and all that wasted crawl time.

This is pretty dumb for Google (and other search engines) to be doing. It’s pretty easy to work out that both the real domain and the AWS subdomain resolve to the same IP address and that the pages are the same. They could be saving themselves a whole lot of time time crawling URLs that are due to a duplicate DNS entry.

Fixing the source of the problem.

As good SEOs we know that we should do whatever we can to make sure that there is only one domain name resolving to a site. There is no way, at the moment, to stop AWS from adding the public DNS entries and so a way to solve this is to make sure that if the web server is accessed using the AWS subdomain then redirect to the main domain. Here is an example using Apache mod_rewrite of how to do this:

RewriteCond %{HTTP_HOST} ec2-123-456-789-012.compute-1.amazonaws.com
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L]

This can be put either in the httpd.conf file or the .htaccess file and basically says that if the requested host is ec2-123-456-789-012.compute-1.amazonaws.com then 301 redirect all URLs to the equivalent URL on www.mydomain.com.

This fix quickly stopped Googlebot from crawling our amazonaws.com subdomain addresses, which took considerable load off our servers, but by the time I’d spotted the problem there were thousands of pages indexed. As these pages were probably not doing any harm I thought I’d just let Google find all the 301 redirects and remove the pages from the index. So I waited, and waited, and waited. After a month the number of pages indexed (according to the site: command) was exactly the same. No pages had dropped out of the index.

Cleaning it up.

To help Google along I decided to submit a removal request using Webmaster Tools. I temporarily removed the 301 redirects too allow Google to see my site verification file (obviously it was being redirected to the verification file on my main domain) and then put the 301 redirect back in. I submitted a full site removal request but it was rejected because the domain was not being blocked by robots.txt. Again, this is pretty dumb in my opinion because the whole of the subdomain was being redirected to the correct domain.

As I was a bit annoyed with the fact that the removal request would not work in the way I wanted it to I thought I’d leave Google another month to see if it found the 301 redirects. After at least another month, no pages had dropped out of the index. This backs up my suspicion that Google does a pretty poor job of finding 301 redirects for stuff that isn’t in the web’s link graph. I have found this before, where I have changed URLs, updated all internal links to point at the new URLs and redirected the old URL. Google doesn’t seem to go back through it’s index and re-crawl pages that it hasn’t found in it’s standard web crawl to see if they have been removed or redirected (or if it does, it does it very, very slowly).

Having had no luck with the 301 approach, I decide to change to using a robots.txt file to block Google. The issue here is that, clearly, I didn’t want to edit my main robot.txt to block bots as that would stop crawling of my main domain. Instead, I created a file called robots-block.txt that contained the usual blocking instructions:

User-agent: *
Disallow: /

I then replaced the redirect entries from my .htaccess file to something like this:

RewriteCond %{HTTP_HOST} ec2-123-456-789-012.compute-1.amazonaws.com
RewriteRule ^robots.txt$ robots-block.txt [L]

This basically says that if the requested host is ec2-123-456-789-012.compute-1.amazonaws.com and the requested path is robots.txt then serve the robot-block.txt file instead. This means I effectively have a different robots.txt file served from this subdomain. Having done this I went back to Webmaster Tools, submitted the site removal request and this time it was accepted. "Hey presto", my duplicate content was gone! For good measure I replaced the robots.txt mod_rewrite with the original redirect commands to make sure any real users are redirected properly.

Reduce, reuse, recycle.

This was all a bit of a fiddle to sort out and I doubt many webmasters hosting on AWS will have even realised that this is an issue. This is not purely limited to AWS, as a number of other hosting providers also create alternative DNS entries. It is worth finding out what DNS entries are configured for the web server(s) serving a site (this isn’t always that easy but you can use your access logs/analytics to get an idea) and then making sure that redirects are in place to the canonical domain. If you need to remove any indexed pages then hopefully you can do something similar to the solution I proposed above.

There are some things that Google could do to help solve this problem:

  • Be a bit more intelligent in detecting duplicate domain entries for the same IP address.
  • Put some alerts into Webmaster Tool so webmasters know there is a potential issue.
  • Get better at re-crawling pages in the index not found in the standard crawl to detect redirects
  • Add support for site removal when a site wide redirect is in place

In the meantime, hopefully I’ve given some actionable advice if this is a problem for you.

Do you like this post? Yes No

Choosing the Right Keyphrases - Especially for the Smaller Sites! »

Posted by Sam Crocker

Hey there folks! Today’s post is a hands-on walkthrough of some of the decision making used when choosing the keyphrases to target. Producing a list of the most important terms in an industry is nice, but actually choosing the right keyphrases is essential. The post was largely created in response to a question submitted by Kien in the comments of my last post.

What to Expect

This post should  provide you with real-life examples of the keyword decision making process and help you make sense of the output from the revamped Keyword Difficulty Tool. If you’re already a hardcore keyword research (and keyword difficulty) guru, this is more of a refresher, but should provide valuable insights to journeymen and perhaps a bit more transparency into how to choose the right keyphrases for your site (plus a bit of a tour for those of you who haven’t used the Keyword Difficulty Tool for a while).

 

While you may get more value from this on smaller sites, and those that are newly launched, large sites with specific keyword targets may benefit, too.

Two Different Camps

There are really two separate and distinct camps on keyphrase research implementation: those who always go for the highest volume search terms that are moderately relevant to a page and those that also give consideration to the competitive landscape for a certain term. Regardless of which camp you fall into, this tool can be immensely useful. If you find yourself in the "competitive" camp and always go after the highest volume term no matter what, the Keyword Difficulty Tool could and should still be used to track your linkbuilding efforts vs. the competition and to have some understanding of if/when you might be able to rank for the term in question. There’s no point in setting goals (even if they are lofty) without having some idea of how to reach them.

Image: Ebaums World

However, if you are in the camp that likes quick results and has a bit more time to give to the keyphrase research process I strongly recommend going the extra mile to indentify realistic targets for the short-term and keep those lofty ambitions in the back of your mind to be addressed as a site grows, gains new links and hopefully acheives higher authority and trust metrics.

But what if my site is brand new?

As a general rule any newer sites should probably be aiming for the "less-moderately competitive" types of keyphrases for the most part. The one exception to this rule would be for exact match domain names.

If you can afford to look at these metrics in this light you can scale up the competitiveness for your terms as you improve the quality and competitiveness of your website.

Click Through Rates

Image from: Rand’s Post on Multiple vs. Singular Keywords

Thinking about it logically (with the above graphic in mind): if you could rank first for a term with 30,000 global monthly searches or somewhere on the fourth or fifth page for a similar term with 300,000 global monthly searches which would you choose? The optimist might choose the second option, but the truly intelligent will pick the first for now and aim for the second option down the road. It doesn’t take superior math skills to figure out that a small fraction of 1.2% of clicks from the 300,000 searches will not amount to anywhere near 42.1% of 30,000 searches.

I’m not saying it’s always better to play it safe, I’m just saying that realistic goals will help you achieve more in the short term, and the opportunity to re-evaluate these goals will help you in the long term.

Does this keyword require a new page? How many terms can I target on a page?

This depends on a couple of factors. It depends, again, on the competitiveness of the terms but also largely depends on the strength of the overall site or page on which the keyword is being targeted.

As covered in Rand’s post in March on this issue, a highly competitive term deserves "single page targeting". This is true in most instances and particularly good advice for smaller, newer sites. The way this works is somewhat backwards, but experience suggests stronger sites can target multiple terms on a page yet also can afford to rank for a larger number of long-tail keyphrases. This may seem a bit unfair, but it is what anecdotal evidence has shown me.

The long and short of it? If your site is big, unless the keyphrase is highly competitive it can probably be targeted on a page targeting other similar terms as well. However, if the phrase is extremely competitive it deserves it’s own page.

In the case of smaller, newer sites there are forces working against the best approach. On the one hand, smaller sites will have fewer pages indexed and will not have a great deal of authority to rely on and spread throught the site. On the other hand, this also means they cannot drop the term "breast augmentation" on the same inner page as "breast enhancement" and expect to rank for both terms.

How do I know if my Site is Strong Enough to Rank for that Term?

The Difficulty Tool pulls in some nice metrics from the rankings that allow you to see a fair bit of information about the other sites ranking for a particular search term or phrase. It won’t always be a simple case of "my page/domain is stronger than theirs, thus I will rank". There will always be other factors: is the keyphrase an exact match for the Top Level Domain [TLD]? Are the other sites targeting multiple terms on the page or just the one? How many inbound links does the other page have? How relevant is this specific keyphrase to your term?

I think you get the point here, it won’t always be a simple fix but let’s look at an example to try to get a clearer idea of how to work through this.

Let’s look at an example from a recent client project I did for a plastic surgeon (honest). In doing research for terms around plastic surgery for a brand new website I tried to get my head around some of the inner nuances of some of the terminology and procedures to try to better understand search behaviour. As you can see from the above research the broad match search volume for breast augmentation is considerably larger than the others [insert corny joke here], however the local search volume is quite comparable for the top two terms and they are about equally competitive.

This particular area of research can be quite complicated because you also have to look at the intent of the searcher and weigh that with the product offered. The term "boob job" whilst funny, is probably not likely to lead to serious searchers who are considering having a selective surgery so that coupled with the lower search volume means we can probably get rid of that one for now (though it might be worth bearing in mind for future link bait).

As an aside: by doing a bit further research into the types of pages that rank for the two terms and looking into search behaviours a bit it actually seems that people searching for "breast implants" are also not likely to be the highest converting traffic and there are certain social stigmas associated with the various terminology. So, in this case we can actually probably rule out breast implants (in terms of the main target of the top level page) because it is not likely to lead to highly converting traffic. This does not, however mean that we won’t want to target the term at all, just that it may move down the priority list a touch.

So, the very first step has helped us eliminate two of the terms for the time being- so we’re making some progress! The next step is to compare the terms that seem as though they might both be realistic targets for the page but are also relatively similar in terms of competitiveness. Although the scores assigned by the Keyword Difficulty Tool can be very helpful when comparing a term that is ranked as a "10" versus one that is ranked as a "95" these "difficulty scores" do not provide enough information alone when comparing two simialrly competitive terms. Thankfully, the tool gives us a lot more data to work with.

As you can see above, although "breast augmentation" seems to be a slightly "less competitive" term based on the diffculty metric there is a clear outlier within this chart (which we can go ahead and guess is going to be Wikipedia without even having the rest of the data) that looks like it will be extremely difficult to outrank even if the top spot seems slightly weaker than for "breast enhancement."

 

Similarly, the overall landscape for "breast enhancement" actually seems a bit more realistic as a target for a new site. Thus, in this case (based solely on the likelihood of ranking) we would actually choose to target "enhancement" rather than "augmentation." And try to work our way up to the more difficut term by building links to the site as a whole and specifically trying to target this page before shifting our approach on the term targeting. But before we make this a final decision, let’s have a slightly closer look at what the competition really looks like.

 

As you can we were right in assuming the 5th spot for the "augmentation" comparison has been taken by Wikipedia, though not on a page directly targeted towards "breast augmentation" (hence why it’s probably riding as low as it is in the rankings).

Meanwhile, setting aside the Wikipedia page it looks like the top spot for "augmentation" is actually being held by a rather weak site that happens to have great anchor text in the domain. This is a perfect indicator of just how much benefit having a strong TLD with exact match anchor text can be, but unfortunately this sort of tactic won’t help much when you’re trying to land the client who wants a "tummy tuck" instead.

 

So, what do we do?

In an ideal world, the client would be a great big site with loads of authority and without much sense. They will have been targeting "boob job" and have 302′ed all of their old links so we can make some quick changes and win. In this case, we go after all of the terms, do a bit of linkbuilding and we’ll probably turn out just fine.

Meanwhle, in the real world situation we would recommend going for "augmentation" as our targeted term for a couple of reasons.

Labeling/Usability Fail

First, it is probably the "best" of the keyphrases in question. It targets the right kind of customer/searcher (we know this based on existing data and background insights on behaviour) and it has the highest search volume at the broader level.

Second, this keyphrase actually makes the most sense for the page we set out to build. As a top-level page it gives us the opportunity to (over time) target some of these other terms on the page (with the exception, maybe, of "boob job") . Augmentation is the most generic term and will allow us to discuss "implants, reduction, enhancement, etc."

Third, after having looked more closely at the sites/pages currently ranking for these terms it actually seems like it will be easier to rank for this particular keyphrase (please note that Wikipedia is not even directly targeting this keyphrase in this case).

 

"But what about all the other keyphrases? I don’t want to waste them!"

This is where the post comes full circle. If you’re building the small/new site the most sensible option (in the short term) is to create a page that is optimised for as many of these pages can be justified and for which you have research. As we mentioned ealier on, you can’t just go after every single keyphrase in the industry on individual pages from the get-go because they won’t all be indexed.

Try just to use some common sense: create the augmentation page high-up within the information architecture, construct a page for "reduction", "implants" and "enhancements" and forget about "boob job" for now. This term may get some traffic but if it doesn’t fit with the theme/tone of the site then save it for the linkbait and build strong links to these inner pages now.

This technique creates much more work. But with a brand new site this is to be expected. Try to structure things in a manner that you can get rid of some of the smaller pages targeting extremely similar terms without impacting usability. This is essential and will make your life much easier in the future.

 

"What if my client is a massive site with great links?"
 We should all be so lucky. This is obviously a different ball game we’re talking about here. But, if you are fortunate enough to have a Domain Authority that is considerably higher than your competitors for the keyphrase(s) in question you aren’t going to struggle too much and you only need one page to rank for a number of terms.

Image via: 3 Meeses

If this is your starting point, I would advise creating one hub/landing page for all "augmentation" related terms. If your site is strong enough the Wikipedia example quite clearly illustrates that some of these other pages may be superfluous.

There’s no need to jam all these keyphrases in the title-tag either. If there are enough inbound links and the site is trusted enough you can probably just go for the highest search volume terms so long as the term is related to the service offered (never forget usability!), if you are in this position kick back, relax and just wait for the little guys to catch-up!

Sam is based in London as a lead SEO at Distilled. He hopes you’ve enjoyed this post and is looking forward to your comments, questions and concerns!

Do you like this post? Yes No

Whiteboard Friday - What’s Working for You? with Richard Baxter »

Posted by Scott Willoughby

The avalanche-like flow of special guest Whiteboard Fridays continues this week with another installment featuring our beloved London SEO expert, Richard Baxter (anchor text, y’all). Last week Richard helped us all learn how to get our fresh content indexed licketty-split, and this week he’s back to help us learn how to identify which areas of our sites are working hardest for us.

Whether you have multiple types of content on your site (maybe a blog, tools, articles, etc.), or you have limited content types across different topics (blog posts about cats, kittens, evil cats, ninja kittens, evil ninja kitten cats, etc.), wouldn’t it be nice to know which content types or topics bring you the most and best traffic?  Never fear, Richard’s here to explain his handy-dandy system to do just that!  By the end of this video you’ll know exactly which stats to pull from your analytics to create a so-shiny-it’s-practically-chromed spreadsheet that will let you peer deep into the inky black heart of your site and know the stars, the slackers, and the shiftless hobos among your content.

Wow! It’s like the future is now! And, since thinking of the future always makes me think of ‘Flash’, and thinking of ‘Flash’ reminds me that those of you without Adobe Flash can’t watch the video, I’ll try to summarize Richard’s bard-like musings on content segmentation and performance analysis.

In order to track and analyze the performance of your individual content, you’ll want to segment out your analytics data by content type. This is really, really easy to do if you have good, clean site structure (which you have, right? RIGHT?!). You can just pull Richard’s data points (below) for the different sections or subfolders of your site. If you were lazy and thought the best way to organize your site was to throw all of the pages into a virtual bucket, dump them out, name them by throwing your keyboard at a stump, and call it a day, you’ll have to get a little more involved with how you filter your segments. No matter what though, you might consider segments like all blog posts (perhaps a ‘CONTAINS /blog’ filter), all tools, all content written by Belverd Needles, III (/authors/belverd), etc. 

Once you have your segment filters in place, you just need to pull the data that Richard suggests and you’ll be able to see exactly how Belverd’s content compares to that of his bloggitty arch-nemesis, Marmaduke Huffsworth, Esq. (/authors/marmaduke). What data you say? This data:

1. Number of Pages per Segment  Richard advocates crawling your site using something like Link Sleuth to get this number; you’ll use it for all sorts of fun calculations. Yes, calculations can be fun. If you don’t believe me, just ask these racially diverse, embroidered youths.

Math is Fun, so say these thread children

2. Number of Keywords Sending Traffic  You can pull this from your analytics. Don’t worry so much about the words themselves here, you just want to know how many different keyword terms are delivering one or more visits to each segment.

3. Number of Pages Getting Entries from Search Engines  How many pages within the segment received one or more visits from a search engine (pick an engine, any engine, or all of them, whatever matters to you…so Google, basically).

4. Total Visits from Google Search Engines  Like it says on the tin, this is just the total number of visits to the segment from search traffic.

5. Percentage of Total Visits that Performed a Conversion Action  This will require that you have some conversion actions setup in your analytics, but it’s a key data point if you want to figure out your strongest content.

So what can all of this stuff tell you? LOTS! By tracking these numbers, you’ll be able to quickly identify which content is working hardest for you. You’ll be able to know whether Marmaduke or Belverd is better at drawing high-converting traffic. You’ll know which subjects and content types are most deserving of your precious time and the investment of your hard-bilked pennies. You’ll know who put the bop in the bop shoo bop, who moved your cheese, and why birds suddenly appear every time I’m near (it’s because my pockets are full of birdseed). You’ll be 12.7-29.4% awesomier than you were before, and you’ll smell delightful ALL THE TIME!

Now aren’t you glad Richard stopped by and shared his magic secrets with you? Thanks, Richard!

p.s. Richard has posted more about getting things indexed quickly w/ PubSubHubBub and more on his blog - well worth a read.

Do you like this post? Yes No

8 Reasons In-House SEOs Hire SEO Consultants »

Posted by Lindsay

When I was an in-house SEO I hired outside SEO consultants. Now as the outside SEO consultant I often work with in-house SEOs. In the comments of my most recent post, an interesting question came up, "…why would a company who has an in-house SEO expert hire an external company?"

Here are 8 excellent reasons why talented in-house SEOs often bring in outside help.

1. Specialized Expertise

Not too long ago, SEO was a niche marketing specialization. I remember when even Internet Marketing was considered a highly niche specialization. In fact, my college marketing instructor tried to talk me out of Internet Marketing because it was too niche and I ran the risk of limiting my prospects down the line.

Times have sure changed. As the search engines have matured and the SEO industry has evolved along with them, it is becoming increasingly difficult to be on top of every SEO related factor. Even something as specific as SEO is segmenting into specializations. Experts have emerged in social media promotion, local SEO, mobile SEO, copy-writing for SEO, link-building, and so on.

.table-spacing {
padding-left: 40px;
}

Duane Forrester "I hired the external consultants simply because they had more experience in the area I needed support in. Everyone needs to learn new things, so you’re rarely an expert in everything at once. Hiring the external consultant gets around a lot of hurdles and ramps up your program much quicker. Their deeper domain expertise allowed me to focus in areas I was strong in, while our entire SEO effort moved forward at the desired pace. Why reinvent the wheel when someone else already has an established, productive program that can benefit you?"
Duane Forrester is an in-house SEO with Microsoft, running their program for MSN.  He is also the author of How To make Money With Your Blog and Turn Clicks into Customers. In his spare time, he writes for Search Engine Land.

I like what Duane said about the hiring of external consultants ramping up your program quicker. By knowing and doing what you do best and outsourcing other tasks, you can super-charge your site’s SEO and get closer to your potential traffic level.

If I worked for a national business comprised on thousands of brick-and-mortar locations (think Burger King), I’d definitely look at retaining the services of someone like David Mihm to ensure I had all the right pieces in place. I doubt that many people reading this post are as well versed on the intricacies of Local SEO as David.

How about mobile? You have the choice to either delve into the details yourself or do as other talented in-house SEOs have done and hire someone like Cindy Krum who wrote the book on Mobile Marketing. Literally.

cindy krum
"Mobile SEO is a niche within a niche, and it is pretty specialized. Top in-house SEO’s have brought me in to help with mobile SEO, simply because they don’t have time to learn the niche. There is a lot to know, and it is easy to make mistakes. Mobile is still a small part of most in-house SEO’s traffic, so they want to know that things are set up correctly, but they don’t have enough bandwidth to devote to learning the niche or even shepherding the project."
Cindy Krum is the CEO and Founder of Rank-Mobile, LLC, and author of Mobile Marketing: Finding Your Customers No Matter Where They Are. She also hosts a weekly radio show called Mobile Presence, acts as an SEOmoz Associate, responding to Q&A about mobile SEO.

Why bumble around yourself on such specialized niches when you can focus on the pieces you know best and outsource those pieces to a more qualified expert? You don’t need to be everything SEO all the time. Give yourself a break!

2. Too Much to Do. Too Little Time.

Effective SEO is a lot of work. Managing the internal politics can be a full-time job unto itself! Perhaps you are confident that you have the strategy nailed down but you just can’t get your projects through the pipeline fast enough. In order to keep things moving while you consider the next big project it can help to hire an outside consultant.

"I outsource as necessary for specific tasks, not for general consulting or strategy. Specific examples include content creation for new pages on a site, link building, and social promotion of blog content. This has generally worked out well as I’m able to shape efforts and budget across all aspects of Internet marketing while having a specific challenge or need addressed by the consulting company."
John Santangelo is an Internet marketing professional based in Jacksonville, FL and currently works in-house as the Search Marketing Manager for a staffing firm.

Once you’ve established what needs to be done, hiring an SEO consultant can help you push through a task list and get closer to your goals.

3. Fresh Perspective

money idea!Working on the same website for years on end can get mighty boring. You can only come up with so many interesting articles related to nylons, and if you have to rewrite the homepage title tag one more time you’re going to scream. With boredom comes creative stagnation. Bringing in the right SEO consultant can help get the creative juices flowing again. Fresh eyes bring fresh ideas to help your business grow.

At SEOmoz we used to provide whirlwind audits in our boardroom. The client would bring along their best and brightest SEOs, marketing folks, and development staff. We’d go through their site and point out areas for improvement. One particular client comes to mind; well known brand, important website, talented SEO expertise… They’d blocked an important directory in the robots.txt. Sometimes when you are too close to a problem you can miss little details like a line in your robots.txt or an important redirect.

4. Educational Purposes

At SEOmoz we often sold an educational component along with our site audits. We’d go in with slide decks and teach anywhere from one to dozens of in-house resources some of our knowledge. This empowers the in-house team to move forward on their own, knowing a little more. Training can be formal or otherwise. Topher describes his outsourced project as a learning experience.


"As the in-house at CNN.com I have used a agency (Bruce Clay) and have brought in an outside consultant. I think a good SEO has to know what they don’t know and I do not know mobile SEO well at all. I went and asked about for a mobile SEO expert and Cindy Krum’s name came up all over the place so I brought her in and she was great. I am still not an expert on Mobile SEO but I for sure know a heck of a lot more now then I did before because of her."
Topher Kohan is the SEO Coordinator for CNN. He joined CNN, a division of Turner broadcasting and a Time Warner company, in early 2008 after two years at the Centers for Disease Control and Prevention.

5. Validation

SEO enhancements can be expensive to implement and sometimes take months or even years to complete. Based on high level experience across more web properties, an outsourced consultant can help you prioritize your enhancements and validate your project plan to ensure you make the most of the development investment.


"Outside SEO consultants typically have very broad experiences with a variety of websites and industries. Our role is to come along side the in-house team and help them manage the process of inserting SEO into the overall marketing and web production schedules and tackle the different hurdles associated with that. The in-house SEOs are our biggest allies to help us navigate the internal roadblocks and in return we are their biggest allies for getting their projects implemented."
Todd Friesen in the Vice President of Search for Position Technologies Inc. and has been working in SEO and online marketing since 1999 with many high profile clients such as Nike and the NCAA.

At SEOmoz we enjoyed working with strong in-house SEO individuals or teams for our consulting gigs. I suspect that this is true for most SEO consultants that specialize more on strategy and less on implementation.

6. Collaboration

As in-house SEOs, a lot of folks work independently. It can be refreshing and rewarding to expand on the one-man show. Marty describes how he and his employer benefit from expanding his team from time to time to meet a need.

marty
"It really benefits me to be able to divvy up the responsibilities for things like site architecture, internal linking, etc. to an outside firm/person I trust while I focus on other important tasks like content migrations and cleanup with our internal web team. I find it very useful to spread the workload in order to be able to launch a redeveloped site sooner rather than later and in most instances it is also more cost effective in the time savings."
Marty Martin is an SEM/SEO with a broad range of experience working for colleges and universities, regional and state tourism, government and business. He is employed currently as an in-house SEO for Leisure Publishing Co., Inc. in Virginia.

7. Overcome Internal Politics

Of course you know your stuff when it comes to SEO. That is how you got your in-house SEO job, right? Then why do you spend so much of your time selling the value of your projects and negotiating for resources? One challenge that a lot of in-house SEOs face is finding the time to do actual SEO work. External consultants can help pave the way to get home grown ideas implemented.

jessica bowman
"Sometimes in-house SEO departments need help convincing another department that their ideas are solid. We do a lot of consulting that helps the different departments learn how to play together throughout the development life cycle."
Jessica Bowman is an SEO Expert, international speaker, member of the SEMPO Board of Directors and works with companies to figure out what they need to build a successful in-house SEO program.

8. Breadth of Knowledge

As an in-house SEO for a growing business, the challenges you face for the first time have more often than not been considered and successfully addressed by another SEO somewhere out there in cyberspace.

will critchlow
"A number of our clients have in-house SEO teams and we love working alongside them. There’s quite a range of reasons why we’d be brought in. One of the most common reasons is because we have specific experience across a range of sites or in solving a specific tough problem."
Will Critchlow is the Director of Distilled, an SEO and internet marketing firm in London and Seattle.

Lets say you’ve inadvertently landed yourself a Google penalty. How do you diagnose the problem, get it fixed, and request forgiveness with a successful outcome? A consultant who has helped other websites work their way out of a penalty situation can be invaluable.

There are plenty of less dramatic examples. How do you implement a WordPress powered blog as a sub-folder of a .Net site? How do you handle millions of constantly expiring pages (as is common with job boards and classified ad sites)? How will you write a compelling link bait piece?

Action Items

The next time you get push back when proposing to hire an SEO consultant, choose from the reasons outlined in this post to support your case.

  1. We need specialized expertise.
  2. We have too much to do. We’ll get this project moving faster if I can get some help.
  3. We can learn a lot from an outside expert.
  4. We want to double check our strategies before we get going.
  5. We would benefit from collaboration with other SEOs.
  6. A consultant can help us work through the concerns of marketing/IT/executives.
  7. We need the help of someone who has done (insert complicated initiative) before.

In-house SEOs hire outside assistance for all kinds of things from strategy, implementation, retainer, special projects and more. Are you an in-house SEO that has worked with external SEO experts? I’d love to hear your experience.

Happy optimizing!

Do you like this post? Yes No

Bing vs. Google: Prominence of Ranking Elements »

Posted by randfish

This past week during the SMX Advanced conference in Seattle, I presented some correlation data alongside Janet Driscoll-Miller, Sasi Parthasarathy of Bing & Matt Cutts of Google. Matt in particular was quite vocal in expressing a desire to see additional data points from our research, primarily around the prominence/visibility of particular elements in the results. This post is intended to help make that available.

2 Tweets from Matt Cutts

I must say that I don’t agree with Matt on the importance of the raw visibility/counts over the ranking correlations. My feeling is that SEOs in these spaces are more interested in answering the question - "what features predict a result will rank higher vs. lower on page 1?" - rather than the more straightforward - "does this feature appear more frequently on page 1 at Google or Bing?" However, I certainly agree that both are relevant and interesting.

If you’re trying to wrap your head around how to understand this prominence/visiblity data vs. our earlier data on the correlation with rankings, here’s how we’d best describe it:

  • Correlation w/ rankings data helps to answer the question, "when this feature appears in results on the first page of Google/Bing, who ranks it higher and by what amount?" Those correlation numbers were derived by looking at the liklihood that a result would rank above another when it contained the target attribute.
  • Visibility/prominence of an element helps to answer the question, "is this element more likely to appears on the first page of Google’s/Bing’s results?" This simply looks at the number of times we saw a result (or multiple results) ranking on page 1 containing the target attribute.

We’re looking at the latter one in this post, but before we dive in, there are a few critical items to understand:

  • This isn’t correlation data and there’s no standard error or deviation numbers here. It’s simply how many times we saw the element in the results we gathered, divided by the total number of results (SERPs or URLs depending on the chart) to get a percentage. 
  • This data is from page 1 of results from 11,351 search results, gathered from Google’s AdWords categories. This means the terms and phrases vary somewhat in search quantity (from sub-100 searches per month to tens or hundreds of thousands) but generally have a commercial focus and a intent. They generally don’t include brand names, long tail phrases or vanityname searches. Overall, we picked them because they’re precisely the kinds of queries most SEOs care about when they’re doing competitive SEO for their companies and clients.  We also ignore the second result in a SERP from the same domain to avoid effects of indented results (which was important for our earlier statistics, but not those in this post). 
  • The results were collected the week of May 31st and thus, include post-"Mayday" update SERPs and likely results from after the "caffeine" launch as well (though Google did not announce when exactly that rollout occurred - it may not have much bearing as caffeine supposedly is an infrastructure, rather than an algorithmic change).
  • Each feature contains two pie charts, one showing the percentage of results that contained at least 1 URL with this feature and another showing the percentage of total URLs in all results (102,296 for Google and 109,966 for Bing - note that some SERPs will fluctuate the quantity of standard web results they show on page 1). These are labeled as "(feature) in SERPs" and "(feature) in URLs," respectively.

In gathering this data, we did not optimize to share it in this fashion. In fact, Ben & I both feel that if we wanted to do it this way, we should gather the first 3-5 pages of results, not just the 1st page.  The way, one could compare the counts on page 1 with the counts on page 2.  However, since we’ve got the data and Matt, Sasi and several other folks expressed interest, we’re sharing anyway. Hopefully in the future we can do more on this front.

Let’s dive in!


Exact Match Domains

These are domains that precisely matched the keywords in the query - e.g. for the query "dog collars" only a domain that matched *.dogcollars.* would be included.

Exact Match Domains in SERPs 

Exact Match Domains in URLs

You can see that Bing has slightly more exact match domains appearing in at least one result of the SERPs we collected and in the overall count of results (all the URLs from all the SERPs).

Exact Match .com Domains

Similar to exact match domains, exact match .com domains had to contain the exact query in the domain name and have a .com TLD extension.

Exact Match .com Domains in the SERPs

Exact Match URLs in the SERPs

Again, Bing showed a slight preference for displaying results from these sites in the SERPs and URLs we observed.

Exact Match .net Domains

As above, but replace ".com" with ".net."

Exact Match .net Domains in the SERPs

Exact Match .nets in URLs

The similarity is much closer in the number of total URLs we saw with .net exact match, but Bing is showing a preference in the SERPs count.

Exact Match .org Domains

In the .org TLDs, we start to see a bit of what we observed in the ranking correlation data:

Exact Match .orgs in the SERPs

Exact Match .orgs in URLs

This is the first exact match domain TLD where Google actually had more SERPs containing a result of this type. Bing, however, had a very tiny amount more URLs with this feature.

Exact Hyphenated Match Domains

One of Matt Cutts’ complaints centered around how Google vs. Bing handled exact hyphenated match domains. When we observed them in ranking correlations, it appeared that, when Google listed them, they would rank them higher than Bing did when they appeared on that first page of results. However…

Exact Hyphenated Match Domains in the SERPs

Exact Hyphenated Match Domains in URLs

As I called out in the presentation and the prior post, Bing has quite a few more SERPs where exact match domains appear and somewhat more URLs, too. This is another data point that should make us all think carefully about the fallacy of presuming correlation = causation. Bing might have a preference for exact hyphenated match domains, but the ranking correlations suggest to me there’s more going on here - maybe something to do with anchor text or where those types of sites tend to get links or something else we haven’t considered?

It’s critical to keep in mind that we’re just looking at individual factors here - not trying to explain why they exist or correlate (at least, not in the data).

Results that Include All Keywords in the Domain Name

Here we looked for domains that contained the keyword query in the domain, even if the match wasn’t exact. For example, mydogcollar.com would now match for the phrase "dog collar."

All Keywords in the Domain Name in the SERPs

All Keywords in the Domain Name in URLs

Again, it’s Bing that shows a higher number of these types of domains in their results.

Results that Include All Keywords in the Subdomain Name

We’ve previously shown some data suggesting that subdomains might have some ranking influence, but not as much as root domains (this was done using our rank modeling / machine learning process). Here’s some raw data on the number of times we observed keyword matching subdomains:

Contains all Keywords in the Subdomain in SERPs

Contains all Keywords in the Subdomain in URLs

Perhaps not surprisingly, Bing again is showing more of these results in their SERPs and individual URLs.

.com Domains

For this feature and all the TLDs below, we’re just looking at any URL that has the domain extension.

.com Domains in the SERPs

.com Domains in URLs

It looks like Bing has very slightly more .coms in their results vs. Google.

.org Domains

Let’s see what happens for .org domains, recalling Google’s apparent preference for them in the ranking correlations.

.org Domains in the SERPs

.org Domains in URLs

Oddly, Bing again seems to have more .org pages in the SERPs and URLs.

.net Domains

URLs with .net probably won’t surprise you much:

.net Domains in the SERPs

.net Domains in URLs

Yet again, Bing is showing a small number more than their Googly competitors.

.edu Domains

Recall how, in the correlation data, the numbers were small(ish) but negatively correlated? Let’s see what the number of results shows: 

.edu Domains in the SERPs

.edu Domains in URLs

True to the stereotype, Google is slightly ahead on number of .edu domains in the SERPs & URLs.

.gov Domains

Given the previous charts, this one likely won’t surprise you:

.gov Domains in the SERPs

.gov Domains in URLs

Google has more .edus and more .govs, too.

Keywords in the Title Element

Not surprisingly, nearly every set of SERPs had at least one result where the title tag contained the keywords:

Keywords in Titles in the SERPs

Keywords in Titles in URLs

Bing shows up with more results that contain title tag to keyword matching. One thing that is worth mentioning is that we didn’t observe the titles the engines chose to show, but rather the page titles from the results themselves. Hence, if a result was showing a DMOZ title or a brand title (which Goole will sometimes insert), we ignored those and just saw the title element on the page itself.

Keywords in the URL

This one actually surprised me, if only because there were even fewer results with keywords in the URL than in the title! 

Keywords in the URL in the SERPs

Keywords in the URL in URLs

Bing again has more results with keyword-matching URLs, though remember that some of that is probably from keyword matching domains, too.

Keywords in the H1

The ranking correlations suggested that the H1 tag isn’t much of a differentiator, yet lots of people still swear by them:

Keywords in the H1 in the SERPs

Keywords in the H1 in URLs

The results would bear out that this is a much less frequent item than URLs or Titles for those ranking on page 1. Bing seems to show more of them than Google, though.

Keywords in the Alt Attribute

Alt attributes looked interesting last fall when we collected ranking information and once again provde worth a look in the correlation data from SMX Advanced. Let’s see what the raw couts show:

Keywords in the Alt Attribute in the SERPs

Keywords in the Alt Attribute in URLs

Bing is showing slightly more of these, but if the positive correlation means something, these numbers certanly suggest there’s lots of opportunity left for good alt attribute practices.

Homepages

Who lists homepages vs. deep pages in the results more?

Homepages in the SERPs

Homepages in URLs

My word! It’s Google by a good margin. Bing’s show of internal pages actually surprises me a bit, though perhaps that’s an old stereotype I need to abolish.

And with that, we’re done!


One important point to notice is that I’ve not included data on link results, as these would be hard to interpret and likely non-useful. Every page of results had pages with links to them and nearly every individual ranking URL also had links (a good sign for Linkscape’s index, but not super valuable as a data point). There were a few other data pieces like this that wouldn’t make sense here (keyword prominence in the body tag, word tokens in the body tag, domain name length, etc) and have thus been excluded.

I’ve done less analysis on these results in general, as I think the data is a bit less ideal for the purpose, but it’s still interesting and hopefully, illustrative of general prominence. I look forward to seeing your interpretations and discussion!

p.s. If you email Ben at SEOmoz dot org, he will send you a lot of numbers in a TSV which is for each query the metrics for each result that we used in these posts.  You can also find raw results in a public Google spreadsheet doc here. Feel free to play around and let us know if you see anything else cool and interesting.

Do you like this post? Yes No

URL Rewrite Smack-Down: .htaccess vs. 404 Handler »

Posted by MichaelC

First, a quick refresher:  URL prettying and 301 redirection can both be done in .htaccess files, or in your 404 handler.  If you’re not completely up to speed on how URL rewrites and 301s work in general, this post will definitely help. And if you didn’t read last week’s post on RewriteRule’s split personality, it’s probably helpful background material for understanding today’s post.

 

Googlebot dreaming of yummy keywords in URLs  

 

"URL prettying" is the process of showing readable, keyword-rich URLs to the end user (and Googlebot) while actually using uglier, often parameterized URLs behind the scenes to generate the content for the page. 

Here, you do NOT do a 301 redirection.

(Unclear on redirection, 301s vs. 302s, etc.?  There’s help waiting for you here in the SEOmoz Knowledge Center.)

 

301s are done when you really have moved the page, and you really do want Googlebot to know where the new page is.

You’re admitting to Googlebot that it no longer exists in the old location.

You’re also asking Googlebot to give the new page credit for all the link juice the old page had earned in the past.

For example, you may have migrated your website to a new content management system, and all of the pages have somewhat different URLs than then had before the move.

  301s pass link juice, and admit it's not where Googlebot thought it was.

If you’re trigger-happy, you might leap to the conclusion that RewriteRule is the weapon of choice for both URL prettying and 301 redirects.  Certainly you CAN use RewriteRule for these tasks, and certainly the regex syntax is a powerful way to accomplish some pretty complex URL transformations. And really, if you’re going to use RewriteRule, you should probably be using it in your httpd.conf file instead.

The Apache docs have a great summary of when not to use .htaccess.

 

Fear Not the 404 Handler

404 Handlers can do some pretty heavy lifting  

First, all y’all who tremble at the thought of creating your very own custom 404 handler, take a Valium.  It’s not that challenging.  If you’ve gotten RewriteRule working and lived to tell the tale, you’re not going to have any difficulty making a custom 404 error handler.

It’s just a web page that displays some sort of "not found" message, but it gives you an opportunity to have a look at the page that was requested, and if you can "save it", you redirect the user to the page they’re looking for with just a line or two of code. 

If not, the 404 HTTP status gets returned, along with however you’d like the page to look when you tell them you couldn’t find what they were looking for.

By the way, having your own 404 handler gives you the opportunity to entertain your user, instead of just making them feel sorry for themselves. Check out this post from Smashing Magazine on creative 404 pages

Having a good sense of humor could inspire love & loyalty from a customer who otherwise might just be miffed at the 404.

Here’s an example of a 404 handler in ASP. Important note: don’t use Response.Redirect – it does a 302, not a 301!

For PHP, you need to add a line to your .htaccess pointing to wherever you’ve put your 404 handler:

  • ErrorDocument 404 /my-fabulous-404-handler.php

Then, in that PHP file, you can get the URL that wasn’t found via:

  • $request = $_SERVER['REDIRECT_URL'];

Then, use any PHP logic you’d like to analyze the URL and figure out where to send the user.
If you can successfully redirect it, set:

  •     header("HTTP/1.1 301 Moved Permanently");
  •     header ("Location: http://www.acmewidgets.com/purple-gadgets.php");

And here’s where it gets a bit hairy in PHP.  There’s no real way to transfer control to another webpage behind the scenes–without telling the browser or Googlebot via 301 that you’re handing it off to the other page. But you can use call require() on the fly to pull in the code from the target page.  Just make sure to set the HTTP code to 200 first:

  •     header(’HTTP/1.1 200 OK’);

And you’ve got to be careful throughout your site to use include_once() instead of include() to make sure you don’t pull a common file in twice.  Another option is to use curl to grab the content of the target page as if it were on a remote server, then regurgitate the HTML back in-stream by echoing what you get back.  A bit hazardous if you’re trying to drop cookies, though…

And, if you really need to send a 404:

  •     header(’HTTP/1.0 404 Not Found’);

Very Important: be careful to make sure you’re returning the right HTTP code from your 404 handler.  If you’ve found a good content page you’d like to show, return a 200.  If you found a good match, and want Googlebot to know about that pagename instead of what was requested, do a 301. If you really don’t have a good match, be sure you send a 404.  And, be sure to test the actual response codes received–I’m a huge fan of the HttpFox Firefox plug-in.

Ease of Debugging

This is where the 404 handler really wins my affection.  Because it’s just another web page, you can output partial results of your string manipulation to see what’s going on. 

Don’t actually code the redirection until you’re sure you’ve got everything else working.  Instead, just spit out the URL that came in, the URL you’re trying to fabricate and redirect to, and any intermediate strings that help you figure it all out.

With RewriteRule, debugging pretty much consists of coding your regex expression, putting in the flags, then seeing if it worked.

Is the URL coming in in mixed case?  The slashes…forward?  Reverse?  Did I need to escape that character…or is it not That Special?

  Out, damned bugs!

 

You’re flying blind. It works, or it doesn’t work. 

If you’re struggling with RewriteRule regular expressions, Rubular has a nice regex editor/tester.

Programming Flexibility

Your 404 handler gives you the most programming flexibility  

With RewriteRule, you’ve got to get all the work done in the single line of regex.

And while regex is elegant, powerful, and should be worshipped by all, sometimes you’ll want to do more complex URL rewriting logic than just clever substitution.

In your 404 handler, you can call functions to do things like convert numeric parameters in your source URL to words and vice versa.

 

Access to Your Database

If you’re working with a big, database-driven site, you may want to look up elements in your database to convert from parameters to words.

 

And since the 404 handler is just another webpage, you can do anything with your database that you’d do in any other webpage.

  404 handlers let you interact with your database to decode an URL, or form parts of your target URL

 For example, I had a travel website where destinations, islands, and hotels all were identified in the database by numeric IDs. The raw page that displayed content for a hotel also needed to show the country and island that the hotel was on.

The raw URL for a specific hotel page might have been something like:

/hotel.asp?dest=41&island=3&hotel=572

Whereas the "pretty URL" for this hotel might have been something like:

/hotels/Hawaii/Maui/Grand-Wailea/

When the "pretty URL" above was requested by the client, my 404 handler would break the URL down into sections: 

  1. looking up the 2nd section in the destinations table (Hawaii = 41)
  2. looking up the 3rd section in the island table (Maui = 3)
  3. looking up the 4th section in the hotel table (Grand Wailea = 572)

Then, I’d call the ASP function Server.Transfer to transfer execution to /hotel.asp?dest=41&island=3&hotel=572 to generate the content.

Now, keep in mind that you’ll probably want to generate the links to your pretty URLs from the database identifiers, rather than hard-code them. For instance, if you have a page that lists all of the hotels on Maui, you’ll get all of the hotel IDs from the database for hotels where the destination = 41
and island = 3, and want to write out the links like /hotels/Hawaii/Maui/Grand-Wailea/.  The functions you write to do this are going to be very, very similar
to the ones you need to decode these URLs in your 404 handler.

Last but not least:  you can keep track of 404s that surprise you (i.e. real 404s) by having the page either email you or log the 404′ed URLs to a table
in your database.

Performance

Google cares about page load speed today  

For most people, the performance hit of doing the work in .htaccess is not going to be significant.

But if you’re doing URL prettying for a massive site, or have renamed an enormous list of pages on your site, there are a few things you might want to be aware of–especially with Google now using page load speed as one of its ranking factors.

All requests get evaluated in .htaccess, whether the URLs need manipulation/redirection or not.

That includes your CSS files, your images, etc.

By moving your rewriting/redirecting to your 404 handler, you avoid having your URL pattern-matching code check against every single file requested from your webserver–only URLs that can’t be found as-is will hit the 404 handler.

Having said that, note that you can pattern-match in .htaccess for pages you do NOT want manipulated, and use the L flag to stop processing early in .htaccess for URLs that don’t need special treatment.

Even if you expect nearly every page requested to need URL de-prettying (conversion to parameterized page), don’t forget about the image files, Javascript files, CSS, etc. The 404 handler approach will avoid having the URLs for those page components checked against your conversion patterns every single time they’re fetched.

A Special Case

OK, maybe this case isn’t all that special–it’s pretty common, in fact. Let’s say we’ve moved to a structure of new pretty URLs from old parameterized URLs.

Not only do we have to be able to go from pretty URL –> parameterized URL to generate the page content for the user, we also want to redirect link juice from any old parameterized URL links to the new pretty URLs.

In the actual parameterized web page (e.g. hotel.asp in the above example), we want to do a 301 redirect to the pretty URL. We’ll take each of the numeric parameters, look up the destination, island, and hotel name, and fabricate our pretty URL, and 301 to that. There, link juice all saved…

But we’ve got to be careful not to get into an infinite loop, converting back and forth and back and forth:

 

When this happens, Firefox offers a message to the effect that you’ve done something so dumb it’s not going even bother trying to get the page.  They say it so politely though: "Firefox has detected that the server is redirecting the request for [URL] in a way that will never complete."

By the way, it’s entirely possible to cause this same problem to happen through RewriteRule statements–I know this from personal experience :-(

It’s actually not that tough to solve this.  In ASP, when the 404 handler passes control to the hotel.asp page, the query string now starts with "404;http".  So in hotel.asp, we see if the query string starts with 404, and if it does, we just continue displaying the page. If it doesn’t start with 404;http then we 301 to the pretty URL.

Other References

Information on setting up your 404 handler in Apache:

Apache documentation on RewriteRule:

ASP.net custom error pages:

 

Technorati Tags

, , ,

Do you like this post? Yes No

Online Marketing News - 2009 - Creative Commons 3.0