Discussing LDA and SEO - Whiteboard Friday »

Posted by Danny Dover

 In this week’s Whiteboard Friday Rand Fishkin and Ben Hendrickson discuss LDA (Latent Dirichlet Allocation) and SEO (Search Engine Optimization). There has been a lot of discussion about the relationship between these two topics lately and this video answers many of the questions people in the community have been asking. It is comprehensive (25 minutes) and uses many easy to understand diagrams and examples to discuss what impact LDA may have on the SEO industry. We look forward to reading your comments below.

if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(’wistia_182573′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/3f92928b60df958d82ec9837f2e8b46951288703.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/97bb1988e8d0fac541a57093f1d3308e2b16b16c.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_182573′,mediaDuration:1516.45})

Embed video
<object width="640" height="360" id="wistia_182573" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"><param name="movie" value="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf"/><param name="allowfullscreen" value="true"/><param name="allowscriptaccess" value="always"/><param name="wmode" value="opaque"/><param name="flashvars" value="videoUrl=http://seomoz-cdn.wistia.com/deliveries/3f92928b60df958d82ec9837f2e8b46951288703.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/97bb1988e8d0fac541a57093f1d3308e2b16b16c.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&playButtonVisible=true&embedServiceURL=http://distillery.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_182573&mediaDuration=1516.45"/><embed src="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf" width="640" height="360" name="wistia_182573" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" wmode="opaque" flashvars="videoUrl=http://seomoz-cdn.wistia.com/deliveries/3f92928b60df958d82ec9837f2e8b46951288703.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/97bb1988e8d0fac541a57093f1d3308e2b16b16c.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&playButtonVisible=true&embedServiceURL=http://distillery.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_182573&mediaDuration=1516.45"></embed></object><script src="http://seomoz-cdn.wistia.com/embeds/v.js" charset="ISO-8859-1"></script><script>if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(’wistia_182573′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/3f92928b60df958d82ec9837f2e8b46951288703.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/97bb1988e8d0fac541a57093f1d3308e2b16b16c.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_182573′,mediaDuration:1516.45})</script> <a href="http://www.seomoz.org/">SEOmoz - SEO Software</a>

 

Video Transcription

Rand: Howdy, SEOmoz fans. Welcome to another edition of Whiteboard Friday.
Today, I am joined by Ben Hendrickson. Ben?

Ben: Hello. We’ve met before.

Rand: Have we really?

Ben: I think so.

Rand: So, Ben is our senior scientist here at SEOmoz. He does a lot of our
research work and has been working on some interesting projects.
Lately, we posted about one of those projects and asked for some
feedback and got some great responses. A lot of people are very
passionate, very excited. And some people are a little confused. So,
we wanted to dive deeper with this LDA stuff.

What’s LDA, Latent Dirichlet Allocation. We wanted to talk about topic
modeling in general. There was some feedback, right, and I am sure
you saw some of it too, that was like, "I’m not quite sure. You’re
saying on-page maybe is more important because of this LDA stuff,
and I always thought on-page just meant keyword density or stuffing
your keywords."

Ben: Yeah. Clearly words used matter. For any given SERP, a huge number of
links aren’t going to rank for it because they have nothing to do
about it because they never use the word at all. Right? I mean,
Google.com ranks a very few things and it has a ton of links. So, of
course, words matter that are on the page.

Rand: But we’ve always, as an SEO, even when you’ve done your previous
research, it was sort of like, boy, it sure does look like links are
a whole lot more important than . . .

Ben: Using the keyword in the title box. Right. Yeah. So this was
something that actually was very surprising for us, which is why we
showed it. What was that? It seems like using other sort of related
words to the query in a very specific way seemed to help a lot.
Right?

Rand: And we were kind of weirded out by that.

Ben: Yeah.

Rand: Or we were at least surprised by that. So, that is why we are sharing
it. So, let’s go back in time a little bit and talk about this whole
. . . for people who are kind of going, "I don’t understand what you
mean when you say it’s more sophisticated than keyword density, or
it’s more sophisticated than a normal keyword metric or keyword
usage." Keyword density is just like the percent of times that the
word is used out off all the words in a document.

Ben: Yeah.

Rand: Super simple to game. Kind of useless for IR is my understanding.

Ben: Well, I mean, it gets you a lot of the way. I mean, at least you have
that word in the document you return to people. But, like your blog
post earlier in the week showed, there is a lot of basic situations
where you can’t tell what is the better content just by doing this.

Rand: Right. And so, IR folks in the ’60s came up with this TF-IDF thing,
which is essentially like looking at whether the terms that are
being used are more frequent in the corpus as a whole. So, if you
are like a library, they look at all the books in the library. Or if
you are a card catalogue, they’ll look at all that. And now that
there are search engines, they look at all of the documents on the
Web.

Ben: Yeah, right. So, the big intuition here is that they are searching
for multiple words. The word that is rarely ever used is the one
that actually matters the most. So, if you are searching for the
SEOmoz building, a document that includes a building and SEOmoz is
probably very relevant. A document that contains "the building" or
"the SEOmoz" is a lot less relevant. So, the basic story there is
that you are biased against caring about words that are very common.

Rand: Right. So I like your Lady Gaga example where you’re like, well,
documents that have Gaga on them are probably way more relevant than
those that just have lady on them, even though lady and Gaga are
both four letter words in the phrase.

Ben: Yeah, exactly.

Rand: All right, cool. So we evolved to this TF-IDF stuff. And then there
is this like co-occurrence thing, which we talked about on the
SEOmoz blog a long time ago. Co-occurrence is kind of interesting
where we look at, and let me make sure I am getting this right. It
is essentially that, oh well, oftentimes when I see, for example,
Distilled Consulting and building and SEOmoz and building, I find
those frequently together because it turns out that we share offices
with Distilled and we do lots of work together and those kinds of
things. So, maybe a document that has both Distilled and building
and SEOmoz might be more relevant than just the one that just says
SEOmoz.

Ben: Exactly. Right. So, if you are trying to basically figure out if it’s
just an offhand reference to it or if it’s something that is
actually valid a whole lot, right, the fact that it is using a whole
lot of other words that also occur with the keyword would be a good
indication of that.

Rand: But then topic modeling, I think that even I get a little bit
confused when I think about topic modeling versus co-occurrence,
because it seems like topic modeling is maybe very similar to this.

Ben: Well, this is great because you drew a Venn diagram that shows the
difference really well.

Rand: Right. Super smart of me.

Ben: It’s like you kind of knew. So you can imagine that you could have a
whole bunch of words that would have a very high co-occurrence with
Star Trek. Right? You could have documents that talk about gravity,
space, planet, and tachyon. But it still might not be about Star
Trek, even though you’ve got four words that co-occur a lot with
Star Trek. It could about astronomy. Those are all real things that
exist in the real world, or at least people think they might exist
in the real world in the context of tachyons. But if you have
something that is talking about tachyons and gravity and William
Shatner, that’s probably Star Trek. Right?

And so, it’s not just the number of words you have that co-occur.
You are actually trying to figure out are these words being used in
the context where they are talking about Star Trek, or are these
words being used in the context of talking about astronomy. The way
we can do this is because in general fewer topics is better. So,
it’s possible that we have something that is talking about astronomy
and TV and it happened to use gravity and tachyon and William
Shatner in the context of something else he did. But it’s more
likely to just have . . .

Rand: So normally, we might say like, "Okay, I can imagine Google using
this to try and do a couple of things." Right?

Ben: Right.

Rand: For weird queries, where maybe the word Star Trek wasn’t used but
they think it might be about that and they think that’s what the
person wanted, maybe they would do it. But for ordinary rankings, it
seems like using these words when I’m talking about astronomy or
using these words when I’m talking about Star Trek isn’t going to
help me any more than not using them. But then we did this topic
modeling work and we tried to analyze that. Right? So we used a
process called LDA, which maybe we can talk about in a sec. But we
used this process to basically build a model that has all these
different topics.

Ben: Right.

Rand: And essentially, the topics, as I understand them, aren’t actually
keywords. They’re just like a mathematical representation of a
subject matter. Like you were saying there’s probably a cartoon
topic, but it’s not like the word occurred necessarily.

Ben: Yeah, right. So, it has actual words in it. Right?

Rand: Yeah.

Ben: You can look at a given topic and you can see all of the words in it
and see how much each word is in it. But no human went by and said
we should make a topic about this to show what words may be put
together. So, if you look at papers, people pretty much refer to
topics by whatever the most common word in it is, which in the case
of cartoon might be cartoon.

Rand: Like I remember one of the early ones we were looking at was
Transformers.

Ben: Yeah, right.

Rand: It was like, oh, well, Optimus Prime and Megatron and Sydney, the
woman who’s in the all of the movies now. She came up a lot. Megan
Fox was in there.

Ben: Is she related to Vanessa Fox.

Rand: I don’t think so.

Ben: Okay.

Rand: In fact, I strongly suspect no.

Ben: Okay.

Rand: I’d guess it’s a screen name. But so, in any case, you get these
topics. You have these words in them. And then when we say, "Well,
how much does this matter? Like how much does it matter if I am
writing a page about Star Trek and I have lots of links pointing to
me, but I’m not ranking as well as I think I should. Could it be
that maybe I have not included keywords that would tell Google that
I am actually about the topic Star Trek or about related topics?"
Yes. And so, we don’t know how important that is. And that’s why we
did something about correlation to try and figure this out.

Ben: Yeah, right. Because, obviously, we don’t work at Google.

Rand: We just have to look at the outcome.

Ben: We have to look at the search results and then decide if this seems
like what they are doing. Yeah. So we try to see.

Rand: All right. So, let’s talk about that correlation process. So Ben,
we’re talking about this correlation thing and a part of me is kind
of going like, as a classic SEO, like non-statistics, math major,
this kind of thing, I kind of go, "Isn’t the best way to test
whether this works is to have like two random documents on the Web,
and I’ll try putting your LDA stuff to work and see if it raises up
one of them or doesn’t raise up the other?" And I can do tests that
way. Like, what’s this correlation? Why do I need that? Is that a
better way to do it?

Ben: I mean, they are just different. We’ve tried doing control tests
where we put the keyword and title tag on one and not the other and
we see which one ranks. But it’s very hard to do enough of those to
reach statistical significance. It’s pretty easy to set ten websites
where one is doing stuff one way and the other is doing stuff the
other way. But you end up doing like four one way and six the other,
or three one way and seven the other.

Frequently, a lot of these effects aren’t that big. Google sees it
as hundreds of things that influence SERPs. So even if you try to
control for as many variables as you can to try and make it the same
between these two, there is just a lot of noise in terms of what
actually ranks higher. So it takes a very large amount of work to
make enough samples to say something with statistical confidence.

Rand: And you never know when you might have some weird factor that is
influencing all of them in some weird way.

Ben: Yeah. There is another problem that you are probably looking at this
really tiny page and little tiny domains because you are not setting
a huge number of large-scale domains to try to this out. Right?

Rand: Right.

Ben: So you are going to get an answer. The question is: Is this answer
going to scale up to real pages people care about from my small
pages that have ten links to them? So, it is a very interesting
process, and I actually would be very fascinated that people get
good results from it. But, we have tried it and the results have all
kind of been . . .

Rand: Middling at best.

Ben: Middling, yeah.

Rand: There are no good conclusions from anything. So instead, we use this
correlation process. Right?

Ben: Right.

Rand: If I understand your process right, you basically run across not a
dozen or a hundred, but hundreds or thousands, in some cases, of
different search results looking for elements that will predict that
something ranks higher or lower.

Ben: Yeah.

Rand: And so I saw that Danny Sullivan left some great comments in our blog
post about LDA. He said, for example, "Well, you guys said that
correlation with keywords in the title is very low. I don’t believe
that at all because, when I look at search results, all the search
results I see almost always have the keyword in the title tag. So,
what are you measuring here that I’m not seeing?"

Ben: Right. The difference is measuring what a keyword is in the search
results versus measuring what is correlated with making it appear
higher in the search results.

Rand: So if all of these included the keyword Star Trek in the title
element, then what’s the ranking correlation of the title element
with the keyword?

Ben: It would be zero. Right?

Rand: Because they are all the same. What’s the possibility that something
will be a blue link appearing on Google?

Ben: That’s an interesting thing. We computed some data a while ago using
the correlations where we were comparing Bing and Google. It
actually was interesting to see Google tends to have a lot of stuff
with this element. Bing had fewer things with that element. It
actually tells you how the search engine is different. It’s
interesting just looking at raw prominence when you are trying to
compare two search engines. But it’s not very interesting when you
are trying to compare two features because . . .

Rand: Or when you’re trying to figure out what will help you rank well.

Ben: Exactly.

Rand: Okay. So, got you. So what Danny Sullivan is talking about with this
"I see the keyword in the title tag like 70 percent of the time or
more," that’s this raw prominence thing.

Ben: Right.

Rand: That’s like how many times does it appear in there? But correlation
of a specific feature with ranking higher is essentially looking at
all of these and then saying like, hmm, you know, on an aggregated
basis across hundreds or thousands of search results . . . I think
the study you did for the Google/Bing thing was like 11,000
different search results. Right?

Ben: It took a long time making search, writing it down on paper.

Rand: Yeah. I bet it did. You’re totally incredible for having done it
manually. So, you look at all of those and then you would say, "Oh,
well this particular element on average like, having the keyword
exactly match the domain name, the top level the domain like it does
here, boy that sure looks like it is correlated with ranking much
higher." I think having the keyword in the domain name was one of
the highest correlated single features that we saw.

Ben: Yeah, right.

Rand: And the same thing goes for number of linking word domains, like
diversity of different link sources that you got. Like in tons and
tons of different websites, I have a link to Amazon, that seems to
predict or correlates well with it doing pretty darn well.

Ben: Right.

Rand: And if I recall, I think correlations for title tags and keyword-
based stuff, with the exception of the domain name, was in the like
0 to 0.1 range. Maybe 0.15, something like that.

Ben: Yeah. In fact, some of them were actually a little bit negative.

Rand: Why would it be negative?

Ben: Because it is quite plausible that if it’s in the title, someone put
it there because they would like to rank higher than they actually
do and (_________) a lot of other things and it’s just not a very
good page.

Rand: So you’re saying, because of keyword stuffing SEOs, there could be a
negative correlation or other conflicts.

Ben: Yeah. Exactly.

Rand: So this on-page stuff, pretty small correlation. Right? So then, we
looked at things like links. A lot of those were in the 0.2 to 0.3
range, with 1 being a perfect correlation. So there was like a link
to your domains. That was pretty decent, like 0.24 or 0.23 or
something like that. Things like page authority, which is a metric
we calculate, was really quite nicely high. It was like 0.35 almost,
0.34, something like that.

Ben: I can’t confirm or deny these numbers. I don’t remember them off the
top of my head.

Rand: All right. But there are different ranges. Right?

Ben: Yeah.

Rand: So, when we looked at linking stuff, it was almost always better than
on-page stuff.

Ben: Yeah, right. Links seem to be, if you had to develop a Google search
algorithm to sort the things and you had to make a choice of Google
as you could, just looking at links seemed to get you most of the
way in terms of anything that we did.

Rand: So then when we saw this LDA thing at 0.32 something, that seems
whacky. That seems crazy high for an on-page factor, because we
never looked at anything that was about the features of the words or
how you use them, with the exception maybe of the keyword in the
domain name, that was this high in correlation. So that sort of
struck us as being very odd, and this is one of the reasons that we
wrote about it and were excited about. But let me just throw this
out there. Correlation is not causation. Right? It could be that
maybe domain name is really the thing that is being ranked. But
maybe it’s other features. Right? Correlation doesn’t necessarily
mean that that is what is causing it.

Ben: Right. And almost certainly our LDA model is not causing it, because
Google doesn’t use our LDA model. They’re not asking for numbers.
Right? Then almost certainly Google is not going to do LDA like we
have done it. They have not used our corpus. We have a model that is
correlated with Google’s results, and it is certainly not causing
Google’s results. But the thing is that it is a very high
correlation. So, they are doing something that is somehow producing
results that are correlated with a LDA model. It is hard to imagine
really what that would be, unless it was some sort of topic modeling
or something like looking at the words used on the page.

Rand: So, there’s two things that come out of this. One is that, to my
mind, when I see something that high and assuming all the numbers
look right, I think some people gave your numbers a hard time, but
it looks like at the least the criticism they have received so far
has not made us doubt that we have done something wrong.

Ben: Yeah. I spend most of the day running code. But it is quite plausible
that I did something wrong. I’m sure I have. But the specific
complaints people have come up with so far aren’t very credible.
But, you know, in the future, it will certainly happen someday.

Rand: I’m sure we are all excited for that day, Ben. Assuming that these
numbers are quite high, doesn’t it sort of say like maybe we’ve been
wrong about this on-page stuff not mattering all that much? Maybe we
should do more on that front, like more investigation, test out the
results, try putting our keywords on the pages in certain ways.

Ben: Well, Google always says to spend time writing good content. Right?
And that’s a little bit hard to apply, but you can interpret that as
being right content makes it clear what your topic is by using words
that are going to eliminate any topic from being (________) except
for the one that you are trying to rank for. So, I don’t know if
it’s that revolutionary. It seems like people have worried a lot
about their content in the past and a lot of people say to do so.

Rand: But so people in the past, they talked about things like, oh, we
should use like the Google Wonder Wheel. And we should use related
searches and put those words on our pages. We should use things like
synonyms that we get from the service. Well, how is the LDA stuff
different? Or is it? Like if I just do these things, am I going to
do great over here?

Ben: Well, I mean they are not going to be bad. But if you can imagine
that when you put a whole bunch of synonyms for tachyon, it’s not
going to actually help clarify if you’re about astronomy or Star
Trek. Right? So, you don’t actually or that you’re trying to discuss
bark collars and you want to just clarify that you are talking about
dogs as opposed to the stuff that wraps trees. You are not going to
want to put a whole bunch of synonyms for collars or barking. Yeah,
but that’s sort of weird and unnatural. You much more want to put
other related words to make it clear that we are talking about some
sort of bark preventive system.

Rand: So, let’s talk really briefly about the tool today. It doesn’t do
exactly this. Right? Instead, it give us a score.

Ben: Yeah.

Rand: All right. Let’s look that.

Ben: Okay.

Rand: Now this LDA score, tool might be an overstatement. It’s a Labs. You
can look and see it. It works. You can put stuff in. But we have a
lot of really beautiful tools here at SEOmoz, and this is not one of
them. So, it’s not the prettiest thing in the world. But it does
leverage the topic modeling work, and you use the specific process,
LDA, which we think is sort of better than some other ones, but not
being as good as the sophisticated stuff Google does.

Ben: Almost certainly.

Rand: I enter a query up here. Something I want to rank for. I put in some
words here, and it will give me a percent telling me how topically
relevant it thinks this content here is to the word here. And it
will do the same thing like if I enter a URL down here, it will
populate this box with the content from that page.

Ben: Right.

Rand: So this gives me sort of a rough sense of I can play around and see
does SEOmoz’s LDA tool work. LDA scores seem to predict anything
that I can rank better. So, I could look at the top ten results and
be like, "Wow, I’m winning on links. I think I’m doing a good job of
keyword usage. But boy, all these other people have much higher LDA
scores than I do. Maybe I should try increasing that." Is that sort
of a suggested application here?

Ben: That would seem very reasonable to me. Like it is kind of new. No one
has a huge amount of experience with it. So far, it seems like
people have said that it chains up a higher score and it has helped
them rank, but that’s very anecdotal. There’s a very plausible
reason why you would think that that would work. But, we’re kind of
on the bleeding edge here.

Rand: We’re not trying to say that like you definitely enter something in
here, you should use this and boost up the rankings of all of your
pages. It will work perfectly or anything like that

Ben: Yeah, exactly. But it seems very plausible that basically getting a
higher score helps you rank higher. And the tool let’s you see
clearly what this kind of topic modeling is going to be able to
figure out. It sort of shows you the kind of connections that Google
certainly will be able to make in figuring out that pizza is related
to food but donkey is not related to food. So you can sort of
explore and see how this stuff works.

Rand: Cool. One weird thing that people have noted and the last point is
that this fluctuates a lot. Oftentimes, when I run it, it will
fluctuate one to five percent change. Like I’ll hit go on the same
URL, the same content, the same keyword, and it will change one
percent to five percent. Sometimes it seems like it can go to maybe
seven, eight, or nine percent. A couple of people have reported –
we haven’t been able to see them — rare instances where it is more
than ten percent fluctuation. So, explain to me what is going on
there. What is the sampling that the tool does?

Ben: Right. So there’s a very large possible number of ways that you could
explain the document with topics. It could be about Star Trek. Or it
could be about astronomy and TV shows. There are lots of different
ways that you could explain the different word usages in there. So
we can’t actually just try all of them and weight them by the
probability because that would take years to answer anybody. So
instead, we sample them based upon their likelihood and then we
average that. So, if you wanted to figure out are most people going
to vote Democrat or Republican this year, you might sample 100
people and you’re going to conclude that 40 percent are going to
vote Democratic this year.

Rand: But then if you sample a different 100 people . . .

Ben: It will be a little bit different. Generally, you can come back and
say 70 percent are going to vote Democratic this year. It’s in
theory possible, but it doesn’t happen that frequently.

Rand: Got you. So you can essentially use this number. If I was really
interested, I would have to get more precise. I could run it a bunch
of times, and I would be getting a bunch of different samples and I
would average those out

Ben: Yeah. In the back end, we’re doing it a bunch of times for you and
averaging them. So averaging it yourself on the front end as you go
isn’t terrible.

Rand: It’s just a big use of our bandwidth.

Ben: Oh, yeah. It really helps our numbers of hits to our website.

Rand: Oh, yeah. I’m sure that’s all correlated with rankings too.

Ben: I know like unique visitors. What’s that?

Rand: All right. Well, Ben, we’re excited about this tool. We really
appreciate you doing this research work. It’s exciting and
interesting. I think we’ll know more in the future, in the months to
come, whether this is really great and applicable for SEO or that it
turns out that maybe it’s some other things causing this weird
correlation.

Ben: Absolutely.

Rand: Well, thanks very much for obviously building this and joining us.
And thanks to all of you for watching Whiteboard Friday. We’ll see
you again next week.

Ben: This was a long one.

Rand: Very impressed that you watched it. We do appreciate it.

Video transcription by SpeechPad.com

[UPDATE by Ben (sept 10th, 12:50pm PST): In the video I stated that "specific complaints people have come up with so far aren't very credible."  This was directed at the claims, not the people who raised them, and I wish I has used the word "accurate" instead of "credible."  My apologizes to anyone who was offended.  Credible people can say things I disagree with.  Indeed, the back and forth over their concerns about the unweighted mean Spearman's rank correlation coefficient has been a useful context to explain exactly why we consider it a better statistic to use than commonly suggested alternatives.

Also, I noticed that Russ Jones did work to reproduce some of our findings.  He used a different dataset and different methodology, emphasized good qualifications to keep in mind, and broke out competitive vs non-competitive which we didn't do.]

[ERRATA by Ben (sept 16th, 2:00pm PST)The blog post above reports the correlation measurement as 0.32.  It should have been 0.17.]

Do you like this post? Yes No

Priceless CRO Advice for $224 »

Posted by Dr. Pete

The past few years have seen an explosion of usability and Conversion Rate Optimization (CRO) tools hit the market. There have been many good roundup posts about these tools, but I want to focus today on a more in-depth approach to putting just 3 of these tools to work: (1) Five Second Test, (2) Crazy Egg, and (3) UserTesting.com. Total cost to do one round of testing: $224.

(1) Five Second Test ($20)

Five Second Test ScreenThe premise behind Five Second Test is incredibly simple – show a visitor your site for 5 seconds and see what they remember (or, alternatively, where they click). This is a great starting point for getting some starter observations about your visitors.

How It Works
Setup is easy – just submit a screenshot of your web page or prototype (great for design comparisons) and the replies start coming in. You can view them individually or grouped by concepts. Five Second Test is actually free, but the $20/month package means you’ll get a larger response rate. It’s worth the extra cash, IMO. You can also earn credits ("karma") by taking other people’s tests – it’s kind of fun and can be informative.

What to Test
Think about the kind of things you want your visitors to know about in 5 seconds: The big questions: Who, What, Why. Here are a few uses I recommend:

  • Do visitors recognize your brand?
  • Do people get what you do?
  • Is your tagline descriptive and effective?
  • Is your page too visually noisy?
  • Is Concept B better than Concept A?
  • Can people find your call to action?

If people are remembering things like "blue", "blonde girl", and "ugly site", you know you’ve got some work to do (those aren’t far from real examples of what I’ve seen).

(2) Crazy Egg ($9)

Crazy Egg ScreenHeat-mapping tools like Crazy Egg take user activity and translate it into visual maps, helping you to easily visualize how people interact with your site. Crazy Egg was founded by SEO wonder kid Neil Patel, and is an amazing bargain at $9/month. If you can’t bother to spend $9 on improving your website, feel free to stop reading this post. I’m serious – go buy a Venti Iced Mocha and a cookie instead of spending money on your business.

How It Works
This one’s a little bit trickier – you’ll have to install a JavaScript snippet similar to Google Analytics and other tools. Then, Crazy Egg starts tracking clicks on your specified page (try to stick to one page, as jumping pages can produce odd results).

What to Test
Crazy Egg not only allows you create to visual heat maps, but also has a "confetti" mode that lets you visualize clicks by segments, such as referring sources and new vs. returning visitors. Here are a few questions a heat-mapping tool can help you answer:

  • Are people clicking where you want them to click?
  • Is your navigation effective?
  • Do you have too many choices?
  • Do search visitors behave differently?
  • Is your call to action getting clicks?

Although some heat-mapping tools can get bogged down in the visuals, I think that Crazy Egg has a very simple, elegant reporting approach that can give you solid insights quickly. Once you’ve gathered some initial impressions from Five Second Test and Crazy Egg, it’s time to do some real user testing…

(3) UserTesting.com ($195)

UserTesting.com ScreenIt used to be that user testing required a lab, expensive equipment, and a difficult recruiting process. Now, you can use remote testing services like UserTesting.com to get quick, inexpensive user feedback. While I won’t say it compares apples-to-apples to laboratory testing, I often find that the insights from even a handful of remote testing subjects can be incredibly useful.

How It Works
Setup is pretty straightforward, but doing it right can take a little bit of time. Technically, you just need to submit your URL and a few instructions to visitors. You pay $39 per visitor and receive both written feedback and an online video of the user walking through your site (with voice-over). Although this is a topic of some debate in the usability community, 5 users is a good number for uncovering core insights and getting solid bang for your buck.

What to Test
Take some time setting up your questions. Traditional usability tests are task-oriented – you tell someone to try to complete a task in a fairly open-ended fashion and watch them go to work. Be specific about the task and ask follow-up questions, like "Would you trust this site enough to make a purchase?" (I generally ask 3-4 follow-ups). A few questions this kind of qualitative testing can help you answer:

  • Can people complete the task?
  • How long does task completion take?
  • Do users experience common stumbling blocks?
  • What are visitors thinking out loud about?
  • Does your search/navigation work as expected?
  • Are you missing features people might be looking for?
  • Do visitors get frustrated using your site?

Qualitative testing can be a great precursor to quantitative (A/B and multivariate) testing. Don’t throw design changes at the wall and see what sticks – put user testing to work to uncover hidden issues on your site. We all need a fresh pair (or 5 pairs) of eyes from time to time.

Here’s to $224 Well Spent

I’m an entrepreneur and a Bohemian – I understand that parting with money isn’t easy. The insights you’ll gain from just over $200, though, will, in my experience, easily yield 10X or even 100X back in online sales improvement. Solid qualitative data collection will also prevent you from making costly mistakes and will better inform how you look at your analytics and quantitative testing. There are plenty of good tools out there – choose a couple of them, and really put the effort into understanding how they work. You’ll be well rewarded.

Update: We just published a YOUmoz post about Crazy Egg that should be an interesting read for anyone who enjoyed this article. David gives some nice examples and a case study of how heat-mapping got one of his clients an 87% conversion boost.

Do you like this post? Yes No

An Interview on SEOBook »

Posted by randfish

Just a short post tonight.

First, off, I’m honored to be interviewed by Aaron Wall. We’ve had our differences and maintain some divergent opinions on a few topics, but we both have an insane passion for helping make SEO professionals better at their job and work hard to grow the credibility of SEO as a whole.

SEOBook Interview

Second - we’ve got a lot of reason to be thankful. SEOmoz was recently named the 334th fastest growing company in the US by Inc Magazine. I was named to Seattle’s 40 Under 40 List (I’m guessing it’s a typo) and we’ve recently passed 6,000 PRO subscribers (actually, we’re up over 6,300 as of today).

SEOmoz's Jen Lopez as Wonder Woman

As amazing as all that is, nearly everyone at SEOmoz is thinking not about these milestones, but about one of our own - Jen Lopez - who noted on her Twitter feed that she’s out battling cancer. We are all with you Jen - every last one of us, with all our hearts. And we agree: #fuckcancer

Do you like this post? Yes No

Latent Dirichlet Allocation (LDA) and Google’s Rankings are Remarkably Well Correlated »

Posted by randfish

Last week at our annual mozinar, Ben Hendrickson gave a talk on a unique methodology for improving SEO. The reception was overwhelming - I’ve never previously been part of a professional event where thunderous applause broke out not once but multiple times in the midst of a speaker’s remarks.

Ben Hendrickson of SEOmoz speaking at the London Distilled/SEOmoz PRO Training
_
Ben Hendrickson speaking in last Fall at the Distilled/SEOmoz PRO Training London
(he’ll be returning this year)

_

I doubt I can recreate the energy and excitement of the 320-person filled room that day, but my goal in this post is to help explain the concepts of topic modeling, vector space models as they relate to information retrieval and the work we’ve done on LDA (Latent Dirichlet Allocation). I’ll also try to explain the relationship and potential applications to the practice of SEO.

A Request: Curiously, prior to the release of this post and our research publicly, there have been a number of negative remarks and criticisms from several folks in the search community suggesting that LDA (or topic modeling in general) is definitively not used by the search engines. We think there’s a lot of evidence to suggest engines do use these, but we’d be excited to see contradicting evidence presented. If you have such work, please do publish!

The Search Rankings Pie Chart

Many of us are likely familar with the ranking factors survey SEOmoz conducts every two years (we’ll have another one next year and I expect some exciting/interesting differences). Of course, we know that this aggregation of opinion is likely missing out on many factors and may over or under-emphasize the ones it does show.

Here’s an illustration I created for a presentation recently to help illustrate the major categories in the overall results:

Illustration of Ranking Factors Survey Data

This suggests that many SEOs don’t ascribe much weight to on-page optimization
_

I myself have often felt that from all the metrics, tests and observations of Google’s ranking results, the importance of on-page factors like keyword usage or TF*IDF (explained below) is fairly small. Certainly, I’ve not observed many results, even in low competitive spaces, where one can simply add in a few more repetitions of the keyword, maybe toss in a few synonyms or "related searches" and improve rankings. This experience, which many SEOs I’ve talked to share, has led me to believe that linking signals are an overwhelming majority of how the engines order results.

But, I love to be wrong.

Some of the work we’ve been doing around topic modeling, specifically using a process called LDA (Latent Dirichlet Allocation), has shown some surprisingly strong results. This has made me (and I think a lot of the folks who attended Ben’s talk last Tuesday) question whether it was simply a naive application of the concept of "relevancy" or "keyword usage" that gave us this biased perspective.

Why Search Engines Need Topic Modeling

Some queries are very simple - a search for "wikipedia" is non-ambiguous, straightforward and can be effectively returned by even a very basic web search engine. Other searches aren’t nearly as simple. Let’s look at how engines might order two results - a simple problem most of the time that can be somewhat complex depending on the situation.

Query for Batman

Query for Chief Wiggum

Query for Superman

Query for Pianist

For complex queries or when relating large quantities of results with lots of content-related signals, search engines need ways to determine the intent of a particular page. Simply because it mentions the keyword 4 or 5 times in prominent places or even mentions similar phrases/synonyms won’t necessarily mean that it’s truly relevant to the searcher’s query.

Historically, lots of SEOs have put effort into this process, so what we’re doing here isn’t revolutionary, and topic models, LDA included, have been around for a long time. However, no one in the field, to our knowledge, has made a topic modeling system public or compared its output with Google rankings (to help see how potentially influential these signals might be). The work Ben presented, and the really exciting bit (IMO), is in those numbers.

Term Vector Spaces & Topic Modeling

Term vector spaces, topic modeling and cosine similarity sound like a tough concepts, and when Ben first mentioned them on stage, a lot of the attendees (myself included) felt a bit lost. However, Ben (along with Will Critchlow, whose Cambridge mathematics degree came in handy) helped explain these to me, and I’ll do my best to replicate that here:

Simplistic Term Vector Model

In this imaginary example, every word in the English language is related to either "cat" or "dog," the only topics available. To measure whether a word is more related to "dog," we use a vector space model that creates those relationships mathematically. The illustration above does a reasonable job showing our simplistic world. Words like "bigfoot" are perfectly in the middle with no more closeness to "cat" than to "dog." But words like "canine" and "feline" are clearly closer to one that the other and the degree of the angle in the vector model illustrates this (and gives us a number).

BTW - in an LDA vector space model, topics wouldn’t have exact label associations like "dog" and "cat" but would instead be things like "the vector around the topic of dogs."

Unfortunately, I can’t really visualize beyond this step, as it relies on taking the simple model above and scaling it to thousands or millions of topics, each of which would have its own dimension (and anyone who’s tried knows that drawing more than 3 dimensions in a blog post is pretty hard). Using this construct, the model can compute the similarity between any word or groups of words and the topics its created. You can learn more about this from Stanford University’s posting of Introduction to Information Retrieval, which has a specific section on Vector Space Models.

Correlation of our LDA Results w/ Google.com Rankings

Over the last 10 months, Ben (with help from other SEOmoz team members) has put together a topic modeling system based on a relatively simple implementation of LDA. While it’s certainly challenging to do this work, we doubt we’re the first SEO-focused organization to do so, though possibly the first to make it publicly available.

When we first started this research, we didn’t know what kind of an input LDA/topic modeling might have on search engines. Thus, on completion, we were pretty excited (maybe even ecstatic) to see the following results:

 

Correlation Between Google.com Rankings and Various Single Metrics
Spearman Correlation of LDA, Linking IPs and TF*IDF

 

(the vertical blue bars indicate standard error in the diagram, which is relatively low thanks to the large sample set)
_

Using the same process we did for our release of Google vs. Bing correlation/ranking data at SMX Advanced (we posted much more detail on the process here), we’ve shown the Spearman correlations for a set of metrics familiar to most SEOs against some of the LDA results, including:

  • TF*IDF - the classic term weighting formula, TF*IDF measures keyword usage in a more accurate way than a more primitive metric like keyword density. In this case, we just took the TF*IDF score of the page content that appeared in Google’s rankings
  • Followed IPs - this is our highest correlated single link-based metric, and shows the number of unique IP addresses hosting a website that contains a followed link to the URL. As we’ve shown in the past, with metrics like Page Authority (which uses machine learning to build more complex ranking models) we can do even better, but it’s valuable in this context to just think and compare raw link numbers.
  • LDA Cosine - this is the score produced from the new LDA labs tool. It measures the cosine similarity of topics between a given page or content block and the topics produced by the query.

The correlation with rankings of the LDA scores are uncanny. Certainly, they’re not a perfect correlation, but that shouldn’t be expected given the supposed complexity of Google’s ranking algorithm and the many factors therein. But, seeing LDA scores show this dramatic result made us seriously question whether there was causation at work here (and we hope to do additional research via our ranking models to attempt to show that impact). Perhaps, good links are more likely to point to pages that are more "relevant" via a topic model or some other aspect of Google’s algorithm that we don’t yet understand naturally biases towards these.

However, given that many SEO best practices (e.g. keywords in title tags, static URLs and ) have dramatically lower correlations and the same difficulties proving causation, we suspect a lot of SEO professionals will be deeply interested in trying this approach.

The LDA Labs Tool Now Available; Some Recommendations for Testing & Use

We’ve just recently made the LDA Labs tool available. You can use this to input a word, phrase, chunk of text or an entire page’s content (via the URL input box) along with a desired query (the keyword term/phrase you want to rank for) and the tool will give back a score that represents the cosine similarity in a percentage form (100% = perfect, 0% = no relationship).

LDA Topics Tool

When you use the tool, be aware of a few issues:

  • Scores Change Slightly with Each Run
    This is because, like a pollster interviewing 100 voters in a city to get a sense of the local electorate, we check a sample of the topics a content+query combo could fit with (checking every possibility would take an exceptionally long time). You can, therefore, expect the percentage output to flux 1-5% each time you check a page/content block against a query.
  • Scores are for English Only
    Unfortunately, because our topics are built from a corpus of English language documents, we can’t currently provide scores for non-English queries.
  • LDA isn’t the Whole Picture
    Remember that while the average correlation is in the 0.33 range, we shouldn’t expect scores for any given set of search results to go in precisely descending order (a correlation of 1.0 would suggest that behavior).
  • The Tool Currently Runs Against Google.com in the US only
    You should be able to see the same results the tool extracts from by using a personalization-agnostic search string like http://www.google.com/xhtml?q=my+search&pws=0
  • Using Synonyms, "Related Searches" or Wonder Wheel Suggestions May Not Help
    Term vector models are more sophisticated representations of "concepts" and "topics," so while many SEOs have long recommended using synonyms or adding "related searches" as keywords on their pages and others have suggested the importance of "topically relevant content" there haven’t been great ways to measure these or show their correlation with rankings. The scores you see from the tool will be based on a much less naive interpretation of the connections between words than these classic approaches.
  • Scores are Relative (20% might not be bad)
    Don’t presume that getting a 15% or a 20% is always a terrible result. If the folks ranking in the top 10 all have LDA scores in the 10-20% range, you’re likely doing a reasonable job. Some queries simply won’t produce results that fit remarkably well with given topics (which could be a weakness of our model or a weirdness about the query itself).
  • Our Topic Models Don’t Currently Use Phrases
    Right now, the topics we construct are around single word concepts. We imagine that the search engines have probably gone above and beyond this into topic modeling that leverages multi-word phrases, too, and we hope to get there someday ourselves.
  • Keyword Spamming Might Improve Your LDA Score, But Probably Not Your Rankings
    Like anything else in the SEO world, manipulatively applying the process is probably a terrible idea. Even if this tool worked perfectly to measure keyword relevance and topic modeling in Google, it would be unwise to simply stuff 50 words over and over on your page to get the highest LDA score you could. Quality content that real people actually want to find should be the goal of SEO and Google’s almost certainly sophisticated enough to determine the different between junk content that matches topic models and real content that real users will like (even if the tool’s scoring can’t do that).

If you’re trying to do serious SEO analysis and improvement, my suggested methodology is to build a chart something like this:

Analysis of "SEO" SERPs in Google
SERPs analysis of "SEO" in Google.com w/ Linkscape Metrics + LDA (click for larger)

Right now, you can use Keyword Difficulty’s export function and then add in some of these metrics manually (though in the future, we’re working towards building this type of analysis right into the web app beta).

Once you’ve got a chart like this, you can get a better sense of what’s propping up your competitors rankings - anchor text, domain authority, or maybe something related to topic modeling relevancy (which the LDA tool could help with).

Undoubtedly, Google’s More Sophisticated than This

While the correlations are high, and the excitement around the tool both inside SEOmoz and from a lot of our members and community is equally high, this is not us "reversing the algorithm." We may have built a great tool for improving the relevancy of your pages and helping to judge whether topic modeling is another component in the rankings, but it remains to be seen if we can simply improve scores on pages and see them rise in the results.

What’s exciting to us isn’t that we’ve found a secret formula (LDA has been written about for years and vector space models have been around for decades), but that we’re making a potentially valuable addition to the parts of SEO we’ve traditionally had little measurement around.

BTW - Thanks to Michael Cottam, who suggested the reference of research work by a number of Googlers on pLDA. There are hundreds of papers from Google and Microsoft (Bing) researchers around LDA-related topics, too, for those interested. Reading through some of these, you can see that major search engines have almost certainly built more advanced models to handle this problem. Our correlation and testing of the tool’s usefulness will show whether a naive implementation can still provide value for optimizing pages.

For those who’d like to investigate more, we’ve made all of our raw data available here (in XLS format, though you’ll need a more sophisticated model to do LDA). If you have interest in digging into this, feel free to email Ben at SEOmoz dot org.

How Do I Explain this to the Boss/Client?

The simplest method I’ve found is to use an analogy like:

If we want to rank well for "the rolling stones" it’s probably a really good idea to use words like "Mick Jagger," "Keith Richards," and "tour dates." It’s also probably not super smart to use words like "rubies," "emeralds," "gemstones," or the phrase "gathers no moss," as these might confuse search engines (and visitors) as to the topic we’re covering.

This tool tries to give a best guess number about how well we’re doing on this front vs. other people on the web (or sample blocks of words or content we might want to try). Hopefully, it can help us figure out when we’ve done something like writing about the Stones but forgetting to mention Keith Richards.

As always, we’re looking forward to your feedback and results. We’ve already had some folks write in to us saying they used the tool to optimize the contents of some pages and seen dramatic rankings boosts. As we know, that might not mean anything about the tool itself or the process, but it certainly has us hoping for great things.

p.s. The next step, obviously, is to produce a tool that can make recommendations on words to add or remove to help improve this score. That’s certainly something we’re looking into.

p.p.s. We’re leaving the Labs LDA tool free for anyone to use for a while, as we’d love to hear what the community thinks of the process and want to get as broad input as possible. Future iterations may be PRO-only.

Do you like this post? Yes No

Two Quick, Simple Social Media Tips »

Posted by RobOusbey

Today, I want to share two pieces of advice that are particularly useful to certain types of business - and will be exceptionally quick to implement. I’ve also created a free download that might help some people implement one of these ideas even more quickly.

About two years ago, I made a recommendation to a client in the UK, and I’ve just seen it used by a hotel in the USA. If your business offers public computers with internet access - such as those in hotel lobbies, libraries, etc - this is for you:

Tip 1: Put up a sign, next to your public computers, with a call to action; typically this could be something like ‘Find us on Facebook’ or ‘Follow us on Twitter’.

Here’s such a poster in use, at the Ledgestone Hotel in Yakima. (Click the image to embiggen.)

Sadly, it doesn’t look like the Ledgestone is doing much with their Twitter account; this probably disappoints people who go to their page, and so they don’t end up with as many followers as they could do. Remember - getting people to your Twitter page (or Facebook, or whatever else you’re asking them to do) is only the first stage - there has to be something there for them when they arrive.

The second tip is more for people who offer wi-fi - this could be all manner of hotels, conference venues, airports, aeroplanes, train stations, coffee shops, etc. For places that offer free wi-fi, this can work even better:

Tip 2: You control the first page visitors see after logging on to your wi-fi. Don’t waste this with a dull message; make the page interesting, and put some calls to action on there.

People have probably logged on to do something - but many will welcome a distraction - particularly if you keep the request brief. Create a nicely styled, but simple page, and add a couple of message on there. Some examples could include:

  • Follow us on Twitter / Like us on Facebook: you could incentivize this, for example: if you’re a coffee shop, then offer a free latte to new followers
  • Sign up to our email newsletter: this will only take them a second if you make sure the form is right there on the page, and again this can be incentivized
  • Don’t forget to check in on foursquare: ideal for almost any location, and this is as good a time as any to remind them to check in
  • If you’re enjoying your stay, please review us: particularly useful for hotels, where online reviews can increase visibility; I’ll go into a little more detail about this below.

There can be some issues with sites noticing that a lot of people from the same IP are visiting, particularly when it comes to review services. Local search expert David Mihm advised me that he’s heard Yelp in particular does try to filter our multiple reviews from the same IP, and that TripAdvisor’s fraud rules do include clauses that might get you into trouble (such as offering incentives for people to write reviews is not permitted.)

I’d recommend that there are two steps around this type of issue:

  1. Try to appeal for reviews only from people who already have accounts on those sites (e.g.: "If you’re a Yelp member, please review us here…." or "If you have a Google account, please leave a review here…"
  2. Make this ‘post-wifi-login’ page available on the public internet; review sites should be able to recognize that lots of people are being referred to your page from the same URL - if it’s public then they’ll be able to visit that page, and should figure out what is going on.

I’ve built a quick free template for you to to download as a starting point. You can visit the file, or download it, by clicking this link: free wifi login CTA page.

(That was created based on a template from LayoutGala; I’m not going to add any licence to it, other than use it however you want. You should change the image that are in it to be local files at the very least.)

Honestly, it doesn’t take long to print off a couple of small posters (or even to publish a nice wifi login page) so I’ll hope to see social-media CTAs cropping up all over the place soon. :)

Do you like this post? Yes No

LDA - Is On-Page Optimization the SEO Secret? »

Posted by Dana Lookadoo

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

How do I recap the SEOmoz PRO Seminar session on Uncovering a Hidden Technique for SEO? The title is so attractive that it produces Pavlonian symptoms as we salivate at the thought of uncovering a hidden SEO treasure. Ben Hendrickson of SEOmoz presented a model which appears to show how Google may assigning relevance to keyword terms based on context - topical relevance.

Is Latent Dirichlet Allocation (LDA) that hidden jackpot?

1st - LDA is not new nor something SEOmoz invented. The Information Retrieval model has been around for 7 or 8 years, and IR geeks have talked about it before. There are a number of resources, as well as nay saying, about LDA and Google’s possible use of it.

2nd - What is new is SEOmoz’s LDA Topics Tool that produces a relevancy score based off a query (search term). It enables one to play with words that may increase a page’s relevancy in the eyes of Google. It shows words that help Google determine how relevant the page is to a user’s search query.

Game Changer?

Kyle Stone tweeted that the LDA tool is a game changer, and many retweeted.

SEOmoz LDA tool = game changer

Is SEOmoz’s LDA tool a game changer? That’s yet to be seen. The goal is to report Ben’s research as presented at the Mozinar and how a layman (myself) interprets such. Rand is going to do a follow-up post to explain more.

Why all the hype?

The SEO Challenge

SEOs face the continual challenge of figuring out Google’s hidden ranking algorithms. How do we rank higher? Which signals are the most important? We know search engines are "learning models" that attempt to understand "context” of words. Google has said for years that webmasters should concentrate most on providing good relevant (contextual) content.

There are ways to rank higher. Is it as easy as 1, 2, 3?

  1. Create quality copy with keyword(s) on the page along with associated anchor text links.
  2. Get good links.
  3. What Ben talked about in this session.

LDA - Topic Modeling & Analysis

Latent Dirichlet Allocation, in layman’s terms, translates to "topic modeling." In search geek terms, LDA is the following formula:

LDA Formula

(Did you digest that? Don’t worry; Mozzers groaned and laughed at the same time. PLUS: Scientist Hendrickson delivered this session after lunch!)

LDA Simplified - Here is Ben’s way of explaining topic modeling:

LDA Formula Simplified

(Okay, I was once proud that I got an A in Logic and Combinatorics - discrete math/set theory. However, that computer science class now feels like basic math compared to this formula.)

It made more sense when Rand Fishkin joined Ben on stage and when Todd Freisen moderated and deciphered during Q&A. (Manuela Sanches of Brazil was sitting next to me and said that Ben’s "presentation needed subtitles!")

The objective of LDA, from my deciphering of Greek, is to understand how Google is using semantic contextual analysis combined with other signals, to define topics/concepts. It’s how Google analyzes the words on a page to determine the "set" to which a word belongs - how relevant a search query is to pages in its database.

For example: How does Google assign relevance to the word "orange" on a page? They determine orange is related to the fruit set or to the color set by page context.

LDA Defined:

"Latent Dirichlet Allocation (Blei et al, 2003) is a powerful learning algorithm for automatically and jointly clustering words into "topics" and documents into mixtures of topics. It has been successfully applied to model change in scientific fields over time (Griffiths and Steyver, 2004; Hall, et al. 2008).

A topic model is, roughly, a hierarchical Bayesian model that associates with each document a probability distribution over "topics", which are in turn distributions over words."

Bayesian - ah, a term I recognize!! Bayesian spam filtering is a method used to detect spam. It draws off a database and learns the meaning of words. It’s "trained" by us when we mark an email as spam. It looks at incoming emails and calculates the probability that the content of an email is contextually spammy.

I found a PowerPoint presentation about Bayesian Inference Techniques by Microsoft Research from 2004 that presents the possibility of using LDA. Go to slide 54 and read:

"Can we build a general-purpose inference engine which automates these procedures?"

Microsoft has been looking at LDA models. Do search engines use it as one of their primary methods?

Ben sampled over 8 million documents with approx. 1,000 queries. He believes Google is using LDA topic modeling to determine (learn) what words mean by their associations with, relevance to, other words on the page. (Other factors are included.) Ben called the results a "co-occurrence explanation" that use a "cosine similarity."

SEO Takeaway:

  • Results that are higher in Google SERPs, in general, have more topical content.
  • Search engines do APPEAR to apply semantic analysisÂ… when indexing a page and determining the intent of the words on the page.

Rand tweeted an explanation (in 140 x 4) as follows:

Rand's tweets explaining LDA

Dana’s LDA Catwalk Metaphor for Topic Modeling:

Imagine the words on your page as walking down the fashion runway in Paris. Your keyword phrase is "dressed" in semantic accessories, words that correlate to and dress up your topic. Associated words bring meaning to and highlight the fashion model’s outfit. Adjectives, modifiers and synonyms are like jewelry, hats, and shoes. The combination can transform your base layers (your target terms) from casual or conservative business attire into a sexy night-on-the-town ensemble.

Combinations and permutations of words on a page "dress" your skinny or curvy fashion model. Relevant words provide Google with an image of what she is wearing and the catwalk upon which she struts. LDA refers back to what Google already knows about these "accessories" (words) and their previous association with the topic terms related to fashion.

Enter Topical Ambiguity - I just broke the "rules" for context with the catwalk metaphor by referring to modeling in two contexts on this page:

  • I used "modeling" terms that relate to the "fashion industry" set.
  • The catwalk metaphor is irrelevant content that is off-topic for discussing "LDA topic modeling."

Google Algorithm Exposed?

Ben clearly said that LDA is an ATTEMPT to explain the SERPs. His scenario, a quote from his presentation slides, follows:

One of us needs to implement it so we can:

1) See how it applies to pages
2) See if it helps explain SERPs
One-two-three-not-it.

LDA is not LSI.

There were some tweets claiming SEOmoz was bringing back LSI or snakeoil. Ben clarified that LDA is not LSI, which deals more with keyword density. He explained that he is NOT talking about loading keywords on a page but about the relevance of the topics within the page. He said that:

"LSI doesn’t have the same bias toward simple explanations. LSI breaks down as you try to scale up the number of topics."

The LDA tool deals with context, semantic relevancy, not density - in addition to some other random factors. Example:

If SEOmoz has a page all about "SEO" and "tools," and there is another word on the page that can be explained by a word that is more related to SEO topic, then the related word would be used. Meaning, "seo tools" doesn’t have to be repeated over and over, and the related word would be interpreted by Google as being relevant.

Ben, who appears to have the brain of a search engine, noted that it "appears" LDA is what Google is heading for in the near future. He said (paraphrased):

If they are not doing it, they seem to be doing something that has the same output. They are probably already using it.

Rand deciphered:

It’s a super weird coincidence if Google is not using it.

Are On-Page Signals Stronger than Links?

Are we heading toward more emphasis of on-page topic modeling? I’m not an IR geek, but I do plan to spend more energy focusing on understanding how search engines retrieve informaton. We are dealing with a semantic Web. LDA may indicate that good old on-page optimization sends stronger signals than links.

SEOmoz’s LDA tool attempts to show how relevant content is to a chosen keyword. It computes relevance of queries.

The following shows how relevant SEOmoz’s Tools page is to Aaron Wall’s SEO Book Tools page.

seo tools relevance for SEOmoz & SEO Book

The score at the top is an indicator of how relevant the content on that page is according to LDA.

  • Aaron’s content is 72%* relevant for the query "seo tools."
  • SEOmoz’s tools page is 40%* relevant.

*NOTE: (I inserted the logos.) You can run the same pages and get different results. The results are similar in that SEO Book always scored as more topically relevant, but the percentage varies. Is this the random Monte Carlo algorithm at work? Ben?

Mozinar Question:

"How do we execute this for SEO?"

Ben’s Answer:

"I don’t actually do SEO. I write code."

That’s up to us, the SEOs, to play and test in our Google playground.

Use the tool to decide if you can win with LDA to optimize your on-page signals.

  1. Use the LDA Topics Tool to return words that could be used on a page for a query.
  2. Then determine who is ranking for that term.
  3. Simply write content that is highly on-topic based off the findings you observe.

If you are not performing that well in the SERPs, think about classic on-page optimization. In the example above, rather than putting another instance of "seo tools" on the page, LDA shows there are better ways to tell Google that you are about that topic. The tool provides a way to measure that.

IMPORTANT: There is a threshold at which too many related words will appear as too spammy. LDA is not something to be used to game Google.

Test the LDA Tool out for yourself, and draw your own conclusions.

***
DISCLAIMER: I’m not claiming this methodology has uncovered hidden SEO treasures. Time, testing and playing around with a new SEOmoz tool while observing the SERPs will reveal the answer. In the meantime, I’m going to dress up my pages and accessorize them with relevant terms that make them dazzle so they look good climbing the Google catwalk.

Do you like this post? Yes No

Four Creative Link Building Tactics - Whiteboard Friday »

Posted by Aaron Wheeler

 In this week’s Whiteboard Friday Rand Fishkin clues you in on four link building tactics that you likely haven’t heard about. Given the importance of link building to SEO, this video should prove to be worth its (virtual) weight in gold. (I mean that in the best possible way ;-p)

if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(’wistia_174843′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/03f8ba29261b82e8cb35f0e4ca815aac8fb05286.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/84d0a346a0b96ddee80f29e3c55a927d31548e09.bin’,distilleryUrl:’http://distillery-app.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_174843′,mediaDuration:397.13})

Embed video

<object width="640" height="360" id="wistia_174843" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"><param name="movie" value="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf"/><param name="allowfullscreen" value="true"/><param name="allowscriptaccess" value="always"/><param name="wmode" value="opaque"/><param name="flashvars" value="videoUrl=http://seomoz-cdn.wistia.com/deliveries/03f8ba29261b82e8cb35f0e4ca815aac8fb05286.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/84d0a346a0b96ddee80f29e3c55a927d31548e09.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&playButtonVisible=true&embedServiceURL=http://distillery-app.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_174843&mediaDuration=397.13"/><embed src="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf" width="640" height="360" name="wistia_174843" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" wmode="opaque" flashvars="videoUrl=http://seomoz-cdn.wistia.com/deliveries/03f8ba29261b82e8cb35f0e4ca815aac8fb05286.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/84d0a346a0b96ddee80f29e3c55a927d31548e09.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&playButtonVisible=true&embedServiceURL=http://distillery-app.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_174843&mediaDuration=397.13"></embed></object><script src="http://seomoz-cdn.wistia.com/embeds/v.js" charset="ISO-8859-1"></script><script>if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(’wistia_174843′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/03f8ba29261b82e8cb35f0e4ca815aac8fb05286.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/84d0a346a0b96ddee80f29e3c55a927d31548e09.bin’,distilleryUrl:’http://distillery-app.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_174843′,mediaDuration:397.13})</script> <a href="http://www.seomoz.org/">SEOmoz - SEO Software</a>

 

Video Transcription

 

Normal
0

false
false
false

EN-US
X-NONE
X-NONE

st1\:*{behavior:url(#ieooui) }

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:”";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Calibri”,”sans-serif”;
mso-bidi-font-family:”Times New Roman”;}

Hey, SEOmoz fans!  Welcome to another edition of Whiteboard Friday.  Today we’re talking about link building and specifically four tactics that are relatively creative, not talked about a ton in the SEO sphere, that can help you get some direct links to virtually any kind of site.

Let’s start with number one up here, giving testimonials.  I know this sounds a little odd.  You’re thinking to yourself, "Wait, I’m a marketer.  I should be trying to get testimonials about my product, my service, my company."  But in fact, give and you shall receive.

So in this case, if are you are a site owner and you have a business and you say nice things about a product that you use, products that you like, free web apps, tools on the webs, blogs, resources, whatever it might be, or specific products or companies, and you email them and say, "Hey, I just wanted to let you know, I really like your service.  I enjoy using it.  If you’d like to use this as a testimonial, feel free."  You can say some nice words and then have a, "My name is Rand Fishkin and I am the CEO of SEOmoz."  When they publish that, they will take it and put it on their GoodProduct.com website, and you can see that gets embedded right into their site and it will link back over to your site.

So, it is a great way to build up a repertoire of contacts, build good relations, and do something nice for the people who are doing something nice for you.  I would definitely not do this disingenuously.  Make sure that you are actually recommending things that you would recommend to a real friend.  It will come back and bite you otherwise.  But if you do this, you can get those great links too.

The second one, design galleries.  This is an odd case because you do have to jump through some hoops.  If you can contract some of those exceptional, high quality, CSS and web design folks to build a really great looking site, something that looks nothing like this horrific drawing.  I don’t even know why I put so many boxes and lines.  I am sure there was a reason.  You can get featured on sites like CSS REMIX or Drawer or CSS Gallery.  If you do a search for CSS galleries, in fact, you will find literally hundreds in the first few hundred results of places where you can get a live link pointing back from those pages just by submitting your site and having a site that looks great.

Now, what I would recommend is that before you go through the design process make sure that you visit a lot of these places and get inspired.  See what makes it.  See what is hot right now.  Those designs have the added benefit of being often very good for users.  Using CSS properly means that you’re loading pages, you are keeping code and design separate.  It can often increase your rate of attracting links as well.  Linking and quality of design are a direct relationship.  As the quality of design rises, so too does the likelihood that people of all kinds, not just design galleries but of all kinds, will link to your site.  They’ll find you more credible.  They’ll want to show you off.  They’ll want to share.  This is a great investment both for the direct links you can get and for the future.

Number three.  This is sort of an interesting one.  Thanks to sites out there like HARO, which is Help a Reporter Out, and a few others, I think PR Newswire runs one as well, you can be a press source simply by combing through databases or lists of people who say, "Hey, I am a reporter in need of a story about a business that keeps dogs in their office and what the impact of having dogs around is.  Can we interview you, show off your business?"  Those stories when they get written about, they might appear in sources as big as "The New York Times" or as small as your local newspaper, but they appear online as well.  When they do, that link will point back to your site giving you a link from a nice press resource, which is a great place to get a link.

Number four, the last one here, turning raw numbers into a data story.  I like this a lot because the idea here is that people produce a lot of interesting data about virtually every industry, but they don’t always do great things with that data.  They’ll produce interesting numbers or numbers that seem boring on their surface but can be used in interesting ways.  It is up to you to be creative about, hmm, okay, comScore published this, Nielsen published that, Forrester published this data research.  If I combine some of those numbers or if I think about how they play out, I can come up with a great story and maybe some cool graphics too about what that means.  I can take some of the data over time and build a story about what’s happening.  I can show that data next to something like Google Trends data or Search Insights data or data from a second or third source.  When I combine those, I have great link and media bait.  The nice thing about producing this is it is not just sort of classic link bait where, "Oh, that’s interesting, I want to share that." But it is interesting because when you are the reference resource for the data, everyone else who writes about the story or who wants to share it has to link back to you.

A good example of this, check out www.seomoz.org/dp/free-charts and you’ll see a bunch of places where we have taken data from great folks like Eightfold Logic used to be Enquisite, comScore, Hitwise, Nielsen, Forrester, and we’ve combined them into unique and interesting ways to view that data.  We didn’t even do much with it, just showed sort of, "Hey, they said that 30% of searches come from Europe and 40% come from Asia, etc., so we’re going to build a pie chart of that that looks great and people can embed that."  Now when they do, they link back to SEOmoz and have the source in there.  We’ll always say what the original source is too.  But by hosting this stuff and creating it, you get all these great links.

All right everyone, I hope we have helped out your link building efforts here today.  I look forward to the discussion in the comments.  We will see you again next week for another edition of Whiteboard Friday.  Take care.

Video transcription by SpeechPad.com


If you have any other advice that you think is worth sharing, please post it in the comments! This post is very much a work in progress.

Do you like this post? Yes No

A New Day, A New SEOmoz »

Posted by randfish

It’s been a wild few weeks at the mozplex. Today wrapped up the amazing mozinar with our half-day tools training just in time to launch the new version of SEOmoz. Should we slow down this crazy pace? Nah.

If you’re feeling a sense of deja vu, don’t worry; it’s perfectly normal. We’re the same old moz, but with a new look, faster loading pages and a surprising amount of new functionality. Let’s walk through it together, shall we?

Big Improvements to PRO Membership

It’s a good day to be PRO; we’ve just released:

• A brand new PRO Dashboard, that’s designed to be the center of everything you can do with your membership, including access to your web app campaigns, tools and tool reports, webinars, Q+A, discount store, etc. If it’s part of PRO, you’ll find it in the Dashboard.

• The web app has made some big improvements and we’re now announcing a full public beta - campaigns should be faster, more accurate and dramatically less buggy. There’s also some cool new functionality I’ll cover below.

• The dramatically upgraded SEO Tools page, which will likely show off plenty of tools you may not have seen/heard about until now.

Slide decks from our PRO Tools Training are now downloadable. We had a highly interactive, terrificly valuable day sharing tips, tricks and applications for the data and resources and wanted to give you a small taste of that experience by making those slides available.

If you’ve been curious about what’s in PRO membership, there’s a new PRO Tour section that gives you a more complete look at the features and functionality. Also - the last chance to get PRO at $79/month and be locked into the rate before it rises to $99 is now - after Friday, the price change goes into effect.

Zoinks! A New SEOmoz Website

Rub your eyes a bit and have a look around. We’ve done a considerable amount of work to make pages load faster, let the design highlight the content in a cleaner fashion and added a few fun bits, too. Big changes include:

• A new home to Learn SEO. I’ve recorded an "Intro to SEO" video and we’ve made all of our learning-focused content available through that page (nearly all of it is entirely FREE!)

• A renewed focus on YOUmoz and the Blog (both of which are featured more prominently on the homepage). We’ve re-designed all of these to help make them more useful and usable, as well as focusing on the content itself with a less-intrusive design. As always, we’ve kept a strong focus on comments and participation and we’re planning to do even more with it in the future.

• More accessibility to our SEO tools, including a free sneak peek at our LDA Labs tool (more about that in my next post)

There’s lots more coming soon (a new about section, upgrades to the marketplace, more free information in the Learn SEO section, etc.) so keep an eye out.

The Web App is Now in Public Beta

Our private beta launch to PRO members had more than 2,000 folks create thousands of campaigns. While the feedback has been phenomenal (your very kind tweets really helped keep our engineers pushing through sleepless nights and crates of pizza), we know there were a lot of bugs and missing functionality in the early release. Starting today, the app is far more stable, speedy and powerful. Crawls should come back consistently, rankings should more consistent and accurate and issues/recommendations are rocking.

Web App Public Beta

We’ve also added a brand new feature - one of our most requested - exportable PDF reports for rankings (with crawl diagnostics and on-page reports coming very soon). As Adam Feldstein, our head of Product, discussed today in his roadmap presentation at the tools training, next on the list is additional crawl issues, Google Analytics integration and exciting new functionality for competitive comparisons in the link analysis tab.

As always, we welcome feedback - your messages have been instrumental in helping us improve, and while we’re feeling good about this wider launch, the web app is likely staying in beta for another few months as we add features and continue to tweak, bug fix and get better.

Still Ironing Out Some Kinks

There’s a few known issues with the new site that should be cleaned up in the next 12-24 hours. These include a bit of CSS oddness on the Beginner’s Guide and the Keyword Difficulty tool (though both still function), the thumbs highlighting being a bit softer than intended (for thumbs up/down you’ve already left), some headline/text font sizes and spacing, etc. Sadly, we’ve also temporarily broken the long beloved functionality of highlighting "new" comments in a post - that should be back soon.

I also noted that we had some issues with Domain Authority in our last push of the Linkscape update. Amazingly, thanks to the hard work of our engineering team, we’re expecting to have new scores up in the next few days (rather than taking a full 2 weeks). We still need to run some tests, but we’re hoping to fix many of the odd outlier issues.

We Love Your Feedback

If you see anything you love, hate or think might be an error, we’d love to hear from you. Every page on the site now has a "Feedback" button on the far left-hand side and we read those obsessively! Of course, you can also leave us comments on this post.

Thanks so much for joining in the adventure that is SEOmoz. In the weeks and months to come, well…. let’s just say you ain’t seen nothing yet :-)

Do you like this post? Yes No

Day 1 at the SEOmoz Training Raceway »

Posted by Dana Lookadoo

This post was originally in YOUmoz, and was promoted to the main blog because it provides great value and interest to our community. The author’s views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

I’m going to speed through the 2nd half of the 1st day at the SEOmoz Pro Training Race Track. Recall that 9 speakers raced through topics covering clicks to conversions.The following are highlights of the end of the race for Day 1.  

Presentation Off

Insights distilled also included the business side of pitching SEO. Will Critchlow and Rand Fishkin dueled it out for their "Presentation Off" to determine who could give the best advice for “How to Pitch SEO.” This marked the first time they “faced off” in battle on US Soil. Will held the winning title to date. Bottom line, both of them presented valuable insights about pitching and when not to pitch (or bother).  

Takeaways from Will Critchlow, The Champion:

  1. Don’t sell to people who have to be convinced of SEO. It’s best to sell to those who know about SEO, those who know they need it. Then, you  never pitch SEO ever again. Will explained why you don’t sell SEO in the pitch:
    • You pitch SEO before that.
    • Selling the client on SEO is a separate conversation, if necessary at all.
  2. Will has been asked to help model the business impacts of SEO changes. such is a different story.
    • He showed the Mozzers how  to look at the prospective client’s industry and give them some unique data.
    • He shared an Excel file to help you (us) control a lot of assumptions.

SEO Traffic Model

Download Distilled’s SEO Traffic Model spreadsheet. http://dis.tl/dk6N59 <nice!> 

Takeaways from Rand Fishkin, The Challenger:

Rand focused on the emotional side and winning minds of the in-house SEO

  1. Get engineers & developers on your side. Explain how SEO will benefit their projects to help them boost speed, grow browse rate (pages/visit), improved accessibility, minimize errors, increase usabiltiy.
  2. In pitching SEO, you can then go one step further to help them sell their project(s) with SEO. From there, help sell other projects for marketing, design, sales, etc.

Rand showed graphs and slides on how to show value based off ROI - showing the value of their traffic:

Traffic Valuation Formula for pitching SEO

<If you’re taking notes, you can see how this would fit into a spreasheet…>

Rand then explain search growth over time - meaning, search is growing, period! If they are not adding 20% budget to SEO, then they are falling back.

“Every day, there are more than a billion searches for information on Google. These people have specific intents. If you’re not adding 20% to your SEO budget this year, you’re falling behind the average."

Show prospective clients which competitors are winning for their keywords:

  1. Show competitors in SERPs.
  2. Match it with yeyword demand.
  3. Show how they are doing, side-by-side.

Competitors Winning for Keywords

And the winner of the Presentation Off is … Rand Fishkin, who edged over the finish line just in front of Will Critchlow.

OK, let’s catch the replay highlights of the rest of the search marketing race.

Joanna Lord drove the fastest car, “The End of Analysis Paralysis.”

She explained it’s time to get serious with metrics and conversions:

1.     What is your website trying to do?

2.     If one metric could identify that you are succeeding or failing, what would it be? How would you know you are gaining or losing ground?

3.     What is the biggest threat to your success?

You should only have 3 or 4 metrics, no more than 5. (Focus)

Joanna then sped around Google Analytics advanced filter fun, including:

  • Social Network Filters – combine
  • Google Image Search - Low hanging fruit if you SEO out of images
  • Cascading Filters – see LunaMetrics.com for tips on customizing advanced filters – something that’s NOT in Google Analytics documentation.

Joanna was stopped in her tracks when she polled the Mozzers to find out how many were using Multiple Custom Variables - 2 hands raised.

MCV is the ability for us to tag visitors for any  number of interactions on our site. It goes beyond the single user-defined variable _setVar() and replaced it with _setCustomVar().

Multiple Custom Variables give us the ability for us to tag visitors for any number of sessions to enable “first touch” attribution rather than Google Analytics default “last touch.”

Multiple Custom Variables in Google Analytics

Resource: How to do First Touch Tracking in Google Analytics

Joanna then screeched around the corner to present her Advanced Analytics Checklist:

  1. Filter the data so you are getting the data you want to manipulate
  2. Segment the data so you can see the right data in different ways
  3. Customize reports so you can compare valuable data sets, find intersections & relationships
  4. Take the resulting insights and dive deeper
  5. Use those deep dive insights and make them actionable for your company
  6. Show the action items (not the data) to your company
  7. Last but not least…do the analytics victory dance.

Whew… surely it was time to full-up again after that session, but no… more typing at high speeds:

Marshall Simmonds - Site Architecture & Best Practices for Big Site SEO

Marshall Simmonds is a seasoned Enterprise-level SEO and works with the NY Times, previously with About.com. Working on large sites requires triage and prioritization. (Race car drivers overlook a chip in the paint when the carburator blows out.) Any level of SEO can view the following triage tips for their own site to determine where to best spend their time:

High Priority Tactics:

  • Sitemaps
  • Education
  • 301s
  • Template SEO – fixing titles, captions, linking
  • Rel=canonical
  • Rewriting urls
  • How much it will make? What’s the cost/traffic potential

Low Priority Tactics:

  • Page load time / site speed – most of time they don’t care, but upper mgt does care. It’s only 1 of 200 signals.
  • URLs
  • Link Flow
  • Video SEO
  • Duplicate content
  • CMS Overhaul
  • W3C compliance

Focus on best practices for the long term. Marshall often recommends you don’t budget for an SEO project. Putting a dollar amount to it turns it into a a project with an end point. SEO doesn’t have an end point.

Marshall proceeded to explain that the NY Times is a duplicate content factory and has some SEO challenges. As a news property, they dramatically see the importance of the following principle:

Optimize all assets!

Optimize all content assets

Ask: Are there any assets that you are not optimizing? If not, then competition is beating.

Key takeaways for all of us in the SEO race:

  • rel=”canonical” is a band aid and solves the problem.
  • Google is not necessarily crawling organically for video, which puts focus on video XML sitemap.
  • Webmaster Tools reports a lot of errors.
  • Title is the most important element.
  • Analytics suck!!!!!!!!
    • Omniture – over reports search referrers
    • Webtrends – under reports search referrers (have to add images)
    • Google analytics doesn’t scale – in middle of search referrers.

 Bottom line, add as many analytics packages that you can afford, optimize, track and prioritize.

Tom Critchlow

Keyword Research & Targeting Tom Critchlow of Distilled explained that you need to group all keywords:  

  • Head terms – main terms, everything you can put in a calendar and plan for
  • Mid-tail – hot trends, cyclical demand, triggered by QDF
  • Long-tail – 4+ words, opportunity since 20-25% of the queries Google sees today they have never seen before.
  • QDF = Query Deserves Freshness
  • QDF is riddled with spam, returns 90% malicious links.
  • Tip: Publish Fast – Cite Fast!!

 Keyword harvesting tools:

  • Google Search Suggest
  • Ninja tip: Geolocation – Google Search Suggest is geo-specific
  • Google Related Searches      
  • Mozenda + API = WIN
    • Mozenda is a paid tool http://mozenda.com/ Easy to use paid tool.
    • Input terms and get long tail key phrases that don’t show up in AdWords tool and long-tail, niche.
  • Look at other data sources. Don’t restrict yourself to keyword tools, and use other data sources relative to your niche.
    • Look at how people tag stories on Delicious

The following is a shot of how to use Mozinda to review tags on Delicious.com. (You can look at Delicious tags without using Mozinda.)  

Using Mozinda to research Delicious tags  

Discount code that applies to full pro plan: seomoz20 (Valid till Sep 15th 2010.)

Build an SEO friendly CMS:

Below is a wireframe template for an ideal CMS that pulls data in:  

Tom's SEO-friendly CMS

Discussion raced through use of APIs for scraping content from the Web and incorporating on your pages to include additional keywords. The boxes on the right represent ideas for pulling in the following:

The Mozzers had lots of questions from the audience about this CMS concept, and Tom’s answer was:

"It’s not that hard!" <sigh>  

Tom then gave away a http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools

The Mozzers had lots of questions from the audience about this CMS concept, and Tom’s answer was:

"It’s not that hard!" <sigh>  

Tom then gave away a Lindsay Wassell got deep under the hood like no one else has done at a conference to show her approach and outline of SEO Audits, starting with her daily schedule. I especially liked that she set a schedule to focus on one client in one day and allow time for lunch to ponder your findings and approach.

Tip: Allow ponder time & 6 weeks or more to deliver an audit. Give it enough time.

The following SEO Audit Outline lays out a suggested framework:

SEO Audit Outline

She incorporates a Scorecard for rating issues with a 1-5 rating scale:

SEO Audit Scorecard

Some Scores are site-wide and some scores are finding-specific.

She placed importance on showing visuals and also providing an actionable Executive Summary. SEOs realize that a 40-page audit is likely to sit on someone’s desk for weeks or months. Give them takeaways they can begin working on now.

Tim Ash – 7 Deadly Sins of Landing Page Optimization

The final race of the day focused on after the click – conversions. Discussion included importance of considering what you do with all that SEO & PPC traffic after they arrive at the site.

Tim Ash did a poll at the end of the race day to see how many Mozzers were doing Conversion Rate Optimization (CRO). Almost 1/2 of the room raised their hand.

Tim starts with insults – You are ignorant and blind. He then asked:

How many of you have talked to the end user in the last quarter? Well, only a few admitted to talking to website users …

Tim showed us how to avoid the following 7 Deadly Sins of Landing Page Design:

  1. Unclear call-to-action
  2. Too many choices
  3. Asking for too much info
  4. Too much text
  5. Not keeping your promises
  6. Visual distractions
  7. Lack of trust

We all left the SEOmoz Raceway convinced that our baby is ugly and tips to optimize and beautify our website babies.

Do you like this post? Yes No

Online Marketing News - 2009 - Creative Commons 3.0