Tree testing and card sorting Yelp’s support content

10 min read Kathryn Reeves

Knowledge bases are tricky things to get right. They need a solid information architecture that equally enables writers to add content easily and accurately, and users to get information easily and quickly.

A challenge? Yes! But impossible…? Of course not!

We recently ran a study on Yelp’s desktop website as part of a redesign exercise for an ebook, and our eye was caught by one of Yelp’s support sites.  We thought it would be fun — yes, fun — to run a tree test and an open card sort on the site’s information architecture, and then use the results to piece together a new one.

This article shares the objectives, structures, and results of the two studies. It doesn’t end in resolution, but is rather the kick-off point for what comes next: redesigning and retesting the information architecture. We’ll be sharing this process, and the outcomes, in a future post. For now, feel free to analyze the data and add your own design recommmendations in the comments.

Now first, let’s confront a common argument against investing in IA design.

Why should we care about information architecture when we have search?

In an article for Nielson Norman Group, Raluca Bidiu raises this question, and on first glance it seems to have an obvious answer. The more powerful and accurate search functions become, the less a strong information architecture matters, right?

Not quite. Bidiu concludes that ‘when websites prioritize search over navigation, users must invest cognitive effort to create queries and deal with the weak site search.’

If users go to the search box because the navigation is unclear, they won’t actually know what is available to search, ending in unclear results. Search also requires users ‘to recall information from their memory…[to] come up with a meaningful query’, thereby increasing cognitive strain.’

In contrast, when navigation labels are clear, users are able to ‘recognize rather than recall’, which is a much easier task that improves usability (which is why it’s included in Jakob’s Nielson’s ten usability heuristics). Make it easy for people. Anticipate their questions and needs, and structure your information in a way that suits them.

We ran a tree test to assess  the findability of content on Yelp’s support site

yelp blog

Tree testing shows us how users interact with an information architecture in the absence of a search function, design, and other navigation elements. Participants are presented with a task and are asked to click through the tree until they land on a page where they think they’ll find the answer. They then click a button that says ‘I’d find it here’, and move on to the next task. For this tree test, we wanted to:

  • test the clarity and effectiveness of the information architecture
  • reveal which parts of the support site language or navigation confused users.

We uploaded the entire site map of Yelp’s support site to form the tree. The tree is made up of first, second, third, and fourth level titles of each page on the site — a total of 295 pages:


We also wrote ten typical user tasks (such as finding out if Yelp can remove bad reviews) that represented what semi-frequent users might want to learn from the support site. We then sourced American participants (because Yelp is very popular in America). And in the pre-activity question we asked them how often they access support material on Yelp (which told us that 77% of participants had done so, and 45% had more than 6 times).

How Yelp’s support site scored overall

Tree testing measures two things: task success, which is the percentage of participants who clicked the correct destination; and task directness, which is the percentage of participants who went directly to a destination without backtracking, regardless of the correctness of their answer.

The average success and directness scores across all ten tasks are shown in the results overview:


Below, we examine three tasks in detail: we’ll look at a high-scoring task so you can see what counts for success when testing an IA with Treejack, and then we’ll look at two low-scoring tasks — the results of which might strongly influence our redesign.

Here’s what a high-scoring task looks like

One of the beautiful aspects of Treejack is how quickly you can see what needs working on. As Dave O’Brien says, ‘This is where tree testing really shines — separating the good parts of the tree from the bad, so we can spend our time and effort fixing the latter.’

In this task, we asked participants to figure out how to cancel a connection to someone after accidentally ‘friending’ the person. We can see below that 94% of participants found the correct destination, and 80% went directly to a destination (correct or not) without backtracking. We can also see that the overall score for this task is nine. A score of eight or above, according to O’Brien, means ‘we’ve earned ourselves a beer.’


Now, when we view the pietree, we can instantly see a positive pattern emerging. The green paths represent the correct direction, and the yellow represents the correct answer. As you’ll see below, the green path is thick and dominant, which tells us quickly that users completed the task successfully.

We can also see that even though some people went down the wrong path, most went back up the tree (indicated by blue) because they recognized they’d gone the wrong way. The pietree enables us to see within seconds the success of the task. The yellow circle is large, which represents the 94% of participants that found the correct answer.


Two key findings from the results of the two lowest scoring tasks

Two of the lowest-scoring tasks gave us truly useful insights that could influence our redesign.

Participants went to multiple places to complete the same task

In Task 1, participants needed to find out what to do if when they saw a bad review of their business. This task received an overall score of one, and took participants an average of 59 seconds to complete — longer than people usually spend when looking for information on a website.

In fact, only 11% landed on the correct destination, and only 22% went down a particular path without backtracking. So 89% of participants went to the wrong information, and 78% of participants went the wrong way at least once. Looking at this chart gives us the information instantly.


One quick look at the pietree tells the same story. Participants went all over the place looking for the correct destination, and the very thin green path shows us the small number that actually got there. Another key warning sign for us is the amount of red in the central ‘Support Centre’ landing page. This means that participants often got both their first clicks and subsequent clicks wrong as they moved back through the tree.


And when when hover over the centre of the pietree, we’re able to see in detail how often people visited the ‘Support Centre’ homepage. Even though only 35 people completed this task, the homepage was clicked on 131 times, which shows how often participants felt the need to go all the way back and start again.

yelp pie tree

The third-level headings are ambiguous and causing confusion

In another task, participants needed to locate their Business ID within their accounts. This task received an overall score of one: only 17% of participants selected the correct destination, and only 22% of participants clicked through a path without backtracking. The task took people an average of 41.2 seconds, which is again a very long time — especially when most participants didn’t get to the correct destination.


And then when we look at the pietree, we’re able to see something particularly interesting: there are two large circles in the centre instead of just one. This shows us that the second level label,‘Yelp for Business Owners’ was the most popular first-click from the homepage ‘Support Centre’ label.

It’s also the correct path, and so the size of the circle would be a good thing…if it wasn’t for the huge amount of red! The red indicates that most people went the wrong way after this point.


When we hover the mouse over the ‘Yelp for Business Owners’ circle, we’re able to see the data in more detail. We can see that of the 125 times participants clicked on ‘Yelp for Business Owners’, 107 of these clicks took participants down the wrong path:


So to find their Business ID, participants clicked on the labels ‘Account information’, ‘Claiming your business page’, ‘Business owner features’, and ‘Business owner FAQ’ far more often than they clicked on the correct location, which was ‘Updating your business page’. This is compelling evidence that shows the ambiguity of labels presented to participants once they’ve clicked on ‘Yelp for Business Owners’, and will definitely be useful data for our redesign.

Creating and running an Open Card sort on Yelp’s support site

Tree tests are great for establishing how effective an information architecture is based on user tasks. Open card sorting is used for a totally different purpose, and can’t be used to prove or benchmark an existing IA. Open card sorting is great for getting ideas for organizing and categorizing the content you have to include.

So after running the tree test, we ran an open card sort on a selection of the support site content to find out how people would naturally group and label it. We selected 40 pages out of the 295 total pages that are aimed at people whose businesses get reviewed on Yelp. And we told participants to consider themselves as business owners in the instructions.

We narrowed the number of cards because expecting participants to sort 295 cards would almost certainly result in a huge abandonment rate. And because the data we’d get from a sort that big would likely be unweildy and difficult to draw conclusions from. A possible next step (which we didn’t take) would be to run separate card sorts on the rest of the content to add to the insights.


For this study, we wanted to:

  • find out if most participants agreed on any particular category or label
  • get ideas for what labels and hierachy to use in the new information architecture.

57 people completed the card sort, and it took them an average of 8.3 minutes.

Standardizing the categories enabled us to to see common themes

In OptimalSort, you can standardize the categories to make sure all the commonly named categories are counted together in the data. For example, we created a standardized category called ‘Yelp deals and vouchers’, based on the label suggestions you can see on the right. We’re then able to see that some of these cards were placed under that label up to 10 times.

Screen Shot 2014-12-09 at 6.48.48 pm

When we look at the actual support site, we can see that Yelp categorizes a lot of this content together. Users click twice to get to this view. There’s also some content on here that might be better placed elsewhere, and the amount of links might require further hierachy or structure.

yelp support

Standardizing another set of categories shows us something different. 18 participants placed the following cards into categories that can mostly be standardized as ‘Business information’ or Business FAQs’. These labels are reminiscent of the overused and totally ambiguous label ‘General information’ that should be avoided at all costs by information designers. A useful question for us to confront is why these particular labels don’t lend themselves to more particular categories for these participants.

yelp optimalsort

Almost 50% of participants made similar categories to each other

The PCA (participant-centric analysis) shows the top three most representative IA submissions by participants, and as tested against all the other categories. Below is the PCA that had the most agreement out of the three — 21 out of 44 participants created an IA similar to this one.

This IA has four categories. These four categories might be a good place to start when redesigning the IA of Yelp’s support page.

PCA results from an open card sort IA

This dendrogram uses the Actual Agreement method to show that X% of participants agree with a particular grouping.

Screen Shot 2014-12-08 at 4.21.51 pm

If we zoom in on the bottom left corner, we can see that 70% of participants agree that the three highlighted cards should be grouped together under a heading that contains the word ‘Deals’:

Screen Shot 2014-12-09 at 7.16.00 pm

And if we cast our eyes a little higher, we can see that 54% of participants think that these three highlighted cards should be grouped together under a heading that contains the words “Bad Reviews’:


These results are some of the patterns that jump out, and that we could take into account in our redesign. The way results are displayed makes it quick to get insights like these.

So, what comes next?

Yes, I’m leaving you hanging — I warned you at the start there’d be little in the way of resolution. The next steps involve using the data to influence the decisions we make in the redesign. And after we’ve created a new potential IA that we feel addresses all the problems the results have shown us, we can run another tree test with the same tasks, and compare the results.

If you’d like to look at the actual results, and either suggest your own design recommendations and offer further (or different) interpretations of the data, email

I’ll give you access to the actual results, and potentially publish your insights.