Why card sorting loves tree testing

Andrew Mayfield

This article was first published on the Global User Research blog.
Card sorting is an effective technique for teasing out the important distinctions in our content inventory. Conducting card sorts is also a great way to gather insights about the nature of the content and your users’ mental models. I like to think of it as an opportunity to ‘load up your brain’ with the information you’ll need to design a well-informed IA. Sam Ng has called it ‘eye-balling’ the data Card sorting produces much more than just a ballpark in which to throw around ideas. However, as you move toward a final candidate for your site structure, you're entering territory that card sorting simply wasn't designed for.


When designing an Information Architecture, we start with a collection of loosely related content and work tirelessly to create an information structure that ‘works’ for as many of our users as possible. What we need is a simple way to validate our ideas so we can use our concepts developed through card sorting and refine them based on research and testing. We need a way to find out if our IA is actually going to work.

What card sorting achieves

Structuring information in a way that makes sense to anybody is not easy, let alone designing for everybody - often thousands of users from different perspectives. Even in simple examples, differences in perception and the effects of personal experience will manifest as disagreements about the nature of content and the interpretation of labels.

Card sorting guides the process of determining ‘what should go together.’ Or as I like to say: ‘what should probably go together… maybe.’ Results from a card sort usually require substantial massaging to form an Information Architecture (IA) and that IA still needs to be proven to work.

Picking up where card sorting leaves off

Users process information differently when performing a seek task as opposed to a sort task. Users process information differently when performing a sort task as opposed to a seek task. When in sort mode we are deeply evaluative, applying considerable effort to organize ideas in a coherent manner. In seek mode, we skim through content, readily discarding information we don't need and selecting quickly when we think we've found something – a pretty close approximation of our web browsing habits!

So we take our card sorting insights from our sort mode respondents, and test the resulting draft IA against some ferocious seek mode users.

Tree testing

We’ve established a simple incompatibility between generative IA techniques like card sorting and the end goal of findable content on your website. With this in mind, tree testing aims to get as close as possible to the actual experience of navigating a website while remaining ‘pure’ about testing the IA in isolation.

From Wikipedia:
“Tree testing is a usability technique for evaluating the findability of topics in a website. It is also known as reverse card sorting or card-based classification. Tree testing is done on a simplified text version of your site structure. This ensures that the structure is evaluated in isolation, nullifying the effects of navigational aids, visual design, and other factors.”

Participants are given a task and set about traversing the IA to look for it. Every step they take is recorded for your analytical pleasure. Did they find the right page? Did they take any wrong turns? How long did it take them? I want every detail!
Screen Shot 2016-06-30 at 3.26.00 PM 
This provides a wealth of information that we can use to pinpoint problem areas in the IA and identify what the problems are likely to be. Tree test analysis is still a human-intensive process, but the data is decidedly more conclusive and easier to interpret when compared to card sorting. The ability to deliver a conclusive test result is as valuable to the IA design process as it is to overcoming project politics.

For example:
“When asked to download a purchase order form, forty percent of participants incorrectly set out within the products and services section. Although some of those participants found the correct destination eventually, fifteen percent of the total participants never found the form.”

Screen Shot 2016-06-30 at 3.30.05 PM 
Unlike full usability testing, tree testing only deals with the IA. This streamlines IA development, as iterative refinement can be done rapidly and with minimal cost. By testing and refining findability early in the project, it is possible to avoid costly late changes that are likely to affect design, content management and copy writing teams. That is, if you are able to push late changes through at all.

Getting started with tree testing

This advice draws upon our experience with client projects and with helping Treejack users around the world to get the most from their tree studies.

  • One: Task authoring matters. A lot. Don’t ask your participants to “Find XYZ” twelve times in a row. You’ll see the boredom reflected in your results: a high skip rate and plenty of non-sequitur responses. Mix it up a little and create real-world scenarios. If necessary, ask your participants to “imagine” or “suppose” that they are coming at it from a certain perspective. Never use the same language in your task description as a label in your IA. As an example, if you ask participants to investigate a certain variety of your company’s provided services, any label with the word services in it will experience undue attention. Think of another way to phrase the task.
  • Two: Don’t bother testing your entire IA. Focus on the parts that matter and that you think are worth worrying about. If you write a task to test your “Contact Us” page, you’ve just wasted the precious attention of your participant, which could’ve been used to test something peculiar to your site. The world is very familiar with common navigation metaphors and its not worth your time to verify that hypothesis. This advice also goes for loading up your tree (the IA itself). Use discretion here, but in most cases you can probably leave out the really common ‘boilerplate’ navigation items.
  • Three: This isn’t a marathon. Ask your participants to complete ten to fifteen tasks. You might have thirty or more tasks in your overall survey, but for each survey participant you’ll want to keep the workload humane and display a subset to each participant. We recommend collecting 40 or more responses to each task. This means for 30 tasks displayed at 10 per user you will need 120 participants to complete your survey.
  • Four: Ask questions! We’re always here to help. Email support@optimalworkshop.com
Andrew Mayfield

Blogs you might also enjoy