Tree testing and card sorting Yelp’s support content

Knowledge bases are tricky things to get right. They need a solid information architecture that equally enables writers to add content easily and accurately, and users to get information easily and quickly.
A challenge? Yes! But impossible…? Of course not!
We recently ran a study on Yelp’s desktop website as part of a redesign exercise for an ebook, and our eye was caught by one of Yelp’s support sites. We thought it would be fun â yes, fun â to run a tree test and an open card sort on the site’s information architecture, and then use the results to piece together a new one.
This article shares the objectives, structures, and results of the two studies. It doesn’t end in resolution, but is rather the kick-off point for what comes next: redesigning and retesting the information architecture. We’ll be sharing this process, and the outcomes, in a future post. For now, feel free to analyze the data and add your own design recommmendations in the comments.
Now first, let’s confront a common argument against investing in IA design.
Why should we care about information architecture when we have search?
In an article for Nielson Norman Group, Raluca Bidiu raises this question, and on first glance it seems to have an obvious answer. The more powerful and accurate search functions become, the less a strong information architecture matters, right?
Not quite. Bidiu concludes that ‘when websites prioritize search over navigation, users must invest cognitive effort to create queries and deal with the weak site search.â
If users go to the search box because the navigation is unclear, they wonât actually know what is available to search, ending in unclear results. Search also requires users âto recall information from their memoryâŚ[to] come up with a meaningful queryâ, thereby increasing cognitive strain.’
In contrast, when navigation labels are clear, users are able to ârecognize rather than recallâ, which is a much easier task that improves usability (which is why itâs included in Jakobâs Nielsonâs ten usability heuristics). Make it easy for people. Anticipate their questions and needs, and structure your information in a way that suits them.
We ran a tree test to assess the findability of content on Yelp’s support site
Tree testing shows us how users interact with an information architecture in the absence of a search function, design, and other navigation elements. Participants are presented with a task and are asked to click through the tree until they land on a page where they think theyâll find the answer. They then click a button that says âIâd find it hereâ, and move on to the next task. For this tree test, we wanted to:
- test the clarity and effectiveness of the information architecture
- reveal which parts of the support site language or navigation confused users.
We uploaded the entire site map of Yelpâs support site to form the tree. The tree is made up of first, second, third, and fourth level titles of each page on the site â a total of 295 pages:
We also wrote ten typical user tasks (such as finding out if Yelp can remove bad reviews) that represented what semi-frequent users might want to learn from the support site. We then sourced American participants (because Yelp is very popular in America). And in the pre-activity question we asked them how often they access support material on Yelp (which told us that 77% of participants had done so, and 45% had more than 6 times).
How Yelp’s support site scored overall
Tree testing measures two things: task success, which is the percentage of participants who clicked the correct destination; and task directness, which is the percentage of participants who went directly to a destination without backtracking, regardless of the correctness of their answer.
The average success and directness scores across all ten tasks are shown in the results overview:
Below, we examine three tasks in detail: weâll look at a high-scoring task so you can see what counts for success when testing an IA with Treejack, and then weâll look at two low-scoring tasks â the results of which might strongly influence our redesign.
Here’s what a high-scoring task looks like
One of the beautiful aspects of Treejack is how quickly you can see what needs working on. As Dave OâBrien says, âThis is where tree testing really shines â separating the good parts of the tree from the bad, so we can spend our time and effort fixing the latter.â
In this task, we asked participants to figure out how to cancel a connection to someone after accidentally âfriendingâ the person. We can see below that 94% of participants found the correct destination, and 80% went directly to a destination (correct or not) without backtracking. We can also see that the overall score for this task is nine. A score of eight or above, according to OâBrien, means âweâve earned ourselves a beer.’
Now, when we view the pietree, we can instantly see a positive pattern emerging. The green paths represent the correct direction, and the yellow represents the correct answer. As youâll see below, the green path is thick and dominant, which tells us quickly that users completed the task successfully.
We can also see that even though some people went down the wrong path, most went back up the tree (indicated by blue) because they recognized theyâd gone the wrong way. The pietree enables us to see within seconds the success of the task. The yellow circle is large, which represents the 94% of participants that found the correct answer.
Two key findings from the results of the two lowest scoring tasks
Two of the lowest-scoring tasks gave us truly useful insights that could influence our redesign.
Participants went to multiple places to complete the same task
In Task 1, participants needed to find out what to do if when they saw a bad review of their business. This task received an overall score of one, and took participants an average of 59 seconds to complete â longer than people usually spend when looking for information on a website.
In fact, only 11% landed on the correct destination, and only 22% went down a particular path without backtracking. So 89% of participants went to the wrong information, and 78% of participants went the wrong way at least once. Looking at this chart gives us the information instantly.
One quick look at the pietree tells the same story. Participants went all over the place looking for the correct destination, and the very thin green path shows us the small number that actually got there. Another key warning sign for us is the amount of red in the central âSupport Centreâ landing page. This means that participants often got both their first clicks and subsequent clicks wrong as they moved back through the tree.
And when when hover over the centre of the pietree, weâre able to see in detail how often people visited the âSupport Centreâ homepage. Even though only 35 people completed this task, the homepage was clicked on 131 times, which shows how often participants felt the need to go all the way back and start again.
The third-level headings are ambiguous and causing confusion
In another task, participants needed to locate their Business ID within their accounts. This task received an overall score of one: only 17% of participants selected the correct destination, and only 22% of participants clicked through a path without backtracking. The task took people an average of 41.2 seconds, which is again a very long time â especially when most participants didnât get to the correct destination.
And then when we look at the pietree, weâre able to see something particularly interesting: there are two large circles in the centre instead of just one. This shows us that the second level label,âYelp for Business Ownersâ was the most popular first-click from the homepage âSupport Centreâ label.
Itâs also the correct path, and so the size of the circle would be a good thing…if it wasnât for the huge amount of red! The red indicates that most people went the wrong way after this point.
When we hover the mouse over the âYelp for Business Ownersâ circle, weâre able to see the data in more detail. We can see that of the 125 times participants clicked on âYelp for Business Ownersâ, 107 of these clicks took participants down the wrong path:
So to find their Business ID, participants clicked on the labels ‘Account information’, ‘Claiming your business page’, ‘Business owner features’, and ‘Business owner FAQ’ far more often than they clicked on the correct location, which was ‘Updating your business page’. This is compelling evidence that shows the ambiguity of labels presented to participants once theyâve clicked on âYelp for Business Owners’, and will definitely be useful data for our redesign.
Creating and running an Open Card sort on Yelp’s support site
Tree tests are great for establishing how effective an information architecture is based on user tasks. Open card sorting is used for a totally different purpose, and can’t be used to prove or benchmark an existing IA. Open card sorting is great for getting ideas for organizing and categorizing the content you have to include.
So after running the tree test, we ran an open card sort on a selection of the support site content to find out how people would naturally group and label it. We selected 40 pages out of the 295 total pages that are aimed at people whose businesses get reviewed on Yelp. And we told participants to consider themselves as business owners in the instructions.
We narrowed the number of cards because expecting participants to sort 295 cards would almost certainly result in a huge abandonment rate. And because the data we’d get from a sort that big would likely be unweildy and difficult to draw conclusions from. A possible next step (which we didn’t take) would be to run separate card sorts on the rest of the content to add to the insights.
For this study, we wanted to:
- find out if most participants agreed on any particular category or label
- get ideas for what labels and hierachy to use in the new information architecture.
57 people completed the card sort, and it took them an average of 8.3 minutes.
Standardizing the categories enabled us to to see common themes
In OptimalSort, you can standardize the categories to make sure all the commonly named categories are counted together in the data. For example, we created a standardized category called ‘Yelp deals and vouchers’, based on the label suggestions you can see on the right. We’re then able to see that some of these cards were placed under that label up to 10 times.
When we look at the actual support site, we can see that Yelp categorizes a lot of this content together. Users click twice to get to this view. There’s also some content on here that might be better placed elsewhere, and the amount of links might require further hierachy or structure.
Standardizing another set of categories shows us something different. 18 participants placed the following cards into categories that can mostly be standardized as ‘Business information’ or Business FAQs’. These labels are reminiscent of the overused and totally ambiguous label ‘General information’ that should be avoided at all costs by information designers. A useful question for us to confront is why these particular labels don’t lend themselves to more particular categories for these participants.
Almost 50% of participants made similar categories to each other
The PCA (participant-centric analysis) shows the top three most representative IA submissions by participants, and as tested against all the other categories. Below is the PCA that had the most agreement out of the three â 21 out of 44 participants created an IA similar to this one.
This IA has four categories. These four categories might be a good place to start when redesigning the IA of Yelpâs support page.
This dendrogram uses the Actual Agreement method to show that X% of participants agree with a particular grouping.
If we zoom in on the bottom left corner, we can see that 70% of participants agree that the three highlighted cards should be grouped together under a heading that contains the word âDealsâ:
And if we cast our eyes a little higher, we can see that 54% of participants think that these three highlighted cards should be grouped together under a heading that contains the words âBad Reviewsâ:
These results are some of the patterns that jump out, and that we could take into account in our redesign. The way results are displayed makes it quick to get insights like these.
So, what comes next?
Yes, I’m leaving you hanging â I warned you at the start there’d be little in the way of resolution. The next steps involve using the data to influence the decisions we make in the redesign. And after we’ve created a new potential IA that we feel addresses all the problems the results have shown us, we can run another tree test with the same tasks, and compare the results.
If you’d like to look at the actual results, and either suggest your own design recommendations and offer further (or different) interpretations of the data, email kathryn@optimalworkshop.com.
I’ll give you access to the actual results, and potentially publish your insights.