Planning a tree test: Part I

7 min read Dave O'Brien

“Plans are nothing; planning is everything.” – Dwight D. Eisenhower

If we have a site structure to test and some tasks in mind, it’s tempting just to dive right in – set up the test, email a bunch of users, and watch the results come in. Easy, right?

However, we’ll get a lot more out of our testing if we take a step back and ask ourselves some basic questions, such as:

  • Why am I running this test? What am I specifically trying to find out?
  • What am I testing – the entire information architecture, or just the top levels of the tree? Or two completely different trees?
  • Who should I test – existing customers, or prospective ones too?

How we answer these questions can change how we run our test and analyze our results.

How many rounds of testing?

The time, effort, money, and participants it will take to develop our site tree depends partly on how many rounds of testing we’re intending to do. More rounds usually means a better result (as we would expect), but there are also diminishing returns to consider.

We recommend a “full fat” process with 3 rounds of testing:

  • Round 1: Test the existing tree (baseline)
  • Round 2: Test 2-3 new tree candidates
  • Round 3: Revise/retest the best tree (often a hybrid)

Because of budget or time constraints, this is often cut down to two rounds:

  • Round 1: Test the existing tree (baseline) and 2-3 new trees
  • Round 2: Revise/retest the best tree (often a hybrid)

The first round of testing shows us where our tree is doing well (yay!) and where it needs more work. So we make some thoughtful revisions. Careful, though, because even if the problems we found seem to have obvious solutions, we still need to make sure our revisions actually work for users, and don’t cause further problems.

The good news is, it’s dead easy to run a second test, because it’s just a small revision of the first one. We already have the tasks and all the other bits worked out, so it’s just a matter of making a copy of the test (in whatever tool we’re using), pasting in our revised tree, and hooking up the correct answers. In an hour or two, we’re ready to pilot it again (to err is human, remember) and then send it off to a fresh batch of participants.

There are two possible outcomes here:

  • Our fixes are spot-on, the participants find the correct answers more frequently and easily, and our overall score climbs. We could have skipped this second test, but confirming that our changes worked is both good practice and a good feeling. It’s also something concrete to show the boss.
  • Some of our fixes didn’t work, or (given the tangled nature of IA work) they worked for the problems you saw in round 1, but now they’ve caused more problems of their own. Bad news, for sure, but better that we uncover them now in the design phase (when it takes a few days to revise and retest) instead of further down the track when the IA has been signed off and changes become painful.

Note that Round 1 combines the “before” and “after” testing, because most of our clients have a good idea of where the weaknesses are in their existing tree. If we don’t, the full 3-round approach described above is recommended; this can be combined with an open card sort to help generate ideas for the revised structure.

On some larger and more complex trees, additional revision rounds may be needed to confirm we have solved the major issues we uncover.

For planning, this means that we need to:

  • add the desired # of rounds into our project schedule
  • determine how we will get enough fresh participants for each round

Which trees will we test?

This usually comes down to two questions:

  • How many trees are we testing at a time?
  • Are we testing the whole tree, or just part of it?

How many trees?

If we’re testing an existing tree for problems, before starting our IA redesign, the answer here is simple – we’re testing just the one tree.

If we’re revising the IA for a site, and we haven’t done a baseline test yet, it’s a good idea to test the “before” and “after” versions. At minimum, this means testing two trees – the existing one (to get a baseline score) and our revised tree (to see what improved and what didn’t).

As mentioned above, though, we really should be testing more than one alternative, so we can be sure our eventual new tree is as effective as possible. Typically, we’ll test 2-3 proposed trees against each other (and against the existing baseline tree), then we’ll test a “best of” hybrid of the two in a second round.

Which part of the tree?

If we’re testing a small or medium-sized tree (say, less than 500 items), we will normally test the whole tree — no major pruning required.

If our tree is larger (say, 500-1,000) items, we have two options:

  • Test the whole tree – Easy to prepare, but affects how many tasks we can ask each participant
  • Test a “pruned” version of the tree – Takes some effort on our part, but lets us concentrate on the parts we’re really interested in.

Finally, if our tree is very large (more than 1,000 items), testing the whole tree may be feasible, but in most cases we recommend pruning the tree to keep the participants’ effort from becoming onerous.

Who will we test?

Who indeed? Early on, we need to determine:

  • Which group (or groups) of users to target (or to specifically exclude)
  • How we will invite them to participate
  • What incentive (if any) we’ll offer them for helping us out

Which user groups?

Most websites serve several different types of users. For example, a toy-store site may get a large number of visits from both children (the toy users) and from adults (the toy buyers). If we’re designing the tree for this site, we’ll naturally have to create a structure that works for both types of users.

This also means that we should test our tree with both types of users. The adults may be easy to recruit, but how will we get children to participate?

There may also be certain users who we don’t want to participate. If we’re designing a website only for overseas users, we don’t want domestic users cluttering our results (and wasting their own time).

For more on user groups, read this tree testing wiki article.

How to recruit participants?

Anyone who has done user research knows that recruiting always takes a bit longer than expected, so we need to start planning this early.

The two classic ways to recruit for online studies are:

  • Email – using lists of existing and/or prospective customers
  • Web ads – invitations posted on our website and/or other related sites

Other methods include commercial research panels and crowd-sourcing sites like Amazon Mechanical Turk.

If we are targeting more than one user group, or if participants are hard to come by, we may need to try a variety of methods.

Will we offer an incentive?

In most cases, yes. We offer incentives in the vast majority of our studies. Even a modest incentive makes it much easier to get good numbers in a short time, which is a godsend to iterative testing.

We have conducted a few studies where we didn’t offer an incentive, but those are special cases.

Because we’re only asking for 5-10 minutes of a person’s time, it’s usually not worth it to reward each participant. Instead, we offer them a chance to win something they value (for example, “5-minute survey – win $300 in groceries!”).

If we’re working for a government agency or a large organization, we may need to decide on an incentive early on, because of the lead time needed to get it approved by management.

Stay tuned…

In Part 2, we look at how to handle problems during the testing phase, as well as figuring out who in your team is responsible for what.

In the meantime, if you’re looking for more information on how to run a good tree test then check out Tree Testing For Websites, a free comprehensive guide to the tree-test method.

If you have questions about using Treejack specifically, contact the fine folks at Optimal Workshop.

If you have questions or thoughts about tree testing in general, you can give me a holler at dave.obrien@outlook.co.nz