The information architecture of libraries: Library of Congress Classification
Originally designed in the early 20th century to specifically organize the collection held by the Library of Congress in the US, the LCC system has 21 classes (categories) that each contain several layers of detailed subclasses allowing for deeper classification granularity which is why it’s better for larger collections. It has more call numbers overall than DDC meaning more works can be organized in it.
From an information architecture (IA) size perspective, this means the LCC system is 21 categories wide and up to 7 levels deep in some parts. It’s one of the largest IAs I’ve ever come across and one of the oldest! So, pull up a chair and I will tell you the story of what happened when the mega IA that is the LCC system met Treejack.
IA: Library of Congress Classification (LCC) system
Research scope: The LCC system
Not in scope: The physical library environment
For the second part of this study, I decided to run a tree test using Optimal Workshop’s Treejack. Treejack is our online tree testing tool (also known as reverse card sorting) that assesses the findability of content. I wanted to test the whole LCC IA but there was no way it was going to lend itself to a card sort — we’re talking thousands of cards here (4,849 to be exact)! If you’re planning to card sort a mega IA, it’s best to break it into smaller chunks and research specific sections at a time (a viable card sort has between approximately 30- 60 cards). So, with a card sort out of the question I decided to run a tree test. A tree test works like a card sort but in reverse. Instead of asking participants to organize the cards into groups that make sense to them (and give those groups names in the case of an open card sort), you ask them to find a specific piece of content within the hierarchical tree-like structure of the IA. In a IA design process, a tree test logically follows the creation of an IA that was built based on card sort insights and I thought running a tree test on the LCC system would nicely complement the DDC system’s card sort.
Preparing for the study
Building the tree
Before heading into Treejack, it’s a good idea to build the tree of your IA out in a spreadsheet, especially when you’re dealing with a really big one like I was! When building and testing a large IA, it’s really easy to get lost and make silly mistakes like accidentally putting labels at the wrong level of the structure but you can avoid this by using a spreadsheet. It’s a lot easier to read, it provides a clear sense of hierarchy and you can highlight your level one headings in a contrasting colour like I did in the image below to help you navigate your IA and stay on track.
From left to right, each column of your spreadsheet represents a level in your IA and you stagger the labels based on their hierarchical relationships within the structure like in the example from this study below:
Building the LCC system IA tree for testing was no mean feat. While the Library of Congress’ website certainly made the structural copying and pasting side of things easy for me, at 4,861 spreadsheet rows deep, it was also very easy to get lost. Going in, I knew it was a big IA but I grossly underestimated the complexity of it. The handy PDFs I found on the Library of Congress’ website outlining the entire LCC system IA were more than 30 pages long for some of the classes (categories) and I spent a lot of time holding a ruler up to my screen to see which level in the structure I was up to. The PDFs also showed the call number ranges for each label within the IA and I needed to remove these as I was concerned they might be distracting or confusing for my participants. To do this as quickly as possible, I copied and pasted the labels into Notepad to remove the formatting, deleted the call number and then copied and pasted what was left into my spreadsheet. Sometimes I was lucky and would find a huge patch of labels all the same level which could be moved in bulk and other times it was quite fiddly with a lot of back and forth across the levels. Not for the faint hearted but completely achievable with some creative thinking. I started color coding the Level 1 rows in the spreadsheet so I could see where I was and I found that really helped me stay on top of it! (see below).
Choosing the task based questions
Tree testing requires participants to work through a series of tasks to locate specific content and the maximum reasonable number of tasks per study is 10, otherwise it just takes too long and people give up! To me 10 seemed kind of perfect in this case! I randomly chose one title from each DDC system class and put them all into a spreadsheet to give me some space to design the wording of the tree test tasks.
From looking at the IA, it was clear to me that participants would need some context in order to have a fair shot at locating each book. Due to the level of granularity in the subclasses and the labels themselves, I couldn’t just tell my participants ‘go find this book’ and actually expect them to find it based on the title alone. In the real world when you go to a library, there are: signs, call numbers, electronic catalogue systems ready for you to look up anything from a subject keyword to the actual title and of course librarians. It’s not fair — or useful — to test a system so reliant on other infrastructure without providing sufficient context to participants.
I researched each of the ten books (some of them I have actually read) to determine their key themes and messages. I looked into why a person might seek these titles out. Are they doing a school project? If so, which class is it for? Which level of education? What is the goal of this fictitious project/report/essay? Why this book and not others?
I also needed to identify the correct location for each of these books within the LCC system and looked there for inspiration for contextual cues to include in my tasks. For example, “Gray’s Anatomy” by Henry Gray lives under ‘Science’ rather than ‘Medicine’ and it would be misleading for me to create a scenario around a medical student looking for a textbook. You certainly don’t want to lead your participants directly to the right answer but it’s also unethical to deliberately lead them away — if you want usable data, don’t tank the results from the beginning.
With all this in mind, here are the 10 book finding based tasks I designed for this Treejack study on the LCC system:
- You’re doing some research into the role of librarians in the digital age and have heard that “This book is overdue!: how librarians and cybrarians can save us all” by Marilyn Johnson is a must-read. Where would you expect to find this book?
- You work for a government department and lead a team of 15 people. A team planning day is coming up and you’re researching ways to help your team think more creatively. A friend recommended you try “Steal Like an Artist: 10 Things Nobody Told You About Being Creative” by Austin Kleon. Where would you expect to find this book?
- You’re writing an essay into the human experience side of religion and want to read The Varieties of Religious Experience: A Study in Human Nature by William James. Where would you expect to find this book in the library?
- A favorite TV show of yours recently explored the political science concept of Machiavellianism and you’re curious. A little Googling has led you to The Prince by Niccolo Machiavelli and you like to borrow it from your university’s library. Where would you expect to find this book?
- You’re a high school student and your English Literature teacher has recommended you read “Eats, Shoots and Leaves” by Lynne Truss to help build your grammatical skills. Where would you expect to find this book?
- You’re visiting the library with a friend who is working on a science project on evolution and they need your help to locate “On the Origin of Species” by Charles Darwin. Where would you expect to find this book?
- Your teenage nephew is working on a science project for school on the human digestive system. His teacher has provided a recommended reading list and “Gray’s Anatomy” by Henry Gray is right at the top of the list and he need your help finding it. Where would you go to look for this book?
- You’re studying industrial design at university and one of your elective classes is on urban design. You lecturer has recommended you read “The Death and Life of Great American Cities” by Jane Jacobs to learn more about the social role of sidewalks in a city. Where would you go to find this book?
- You’re doing some research into women in literature in the early twentieth century and your aunt, who is a university lecturer, has suggested you read “A Room of One’s Own” by Virginia Woolf. Where would you go to find this book?
- You’re a university student and for your American Literature class, you need to read “I Know Why the Caged Bird Sings” by Maya Angelou for an essay based assignment about how the author’s love of literature helped her overcome and stand up to racism and trauma. The teacher ran out of copies, so you have to get your own from the library. Where would you expect to find it?
With the Optimal Workshop suite of tools, you can add post-study questions that appear at the end of the study after participants have completed the activity. It’s no substitute for talking to users in person, but I sometimes include them to gain additional context. I’d made the mistake of not asking enough questions before and this study gave me a chance to learn from that and try again. Once again, I asked my participants the multichoice radio button response question of: “When was the last time you visited a library?” but this time followed it with “If it’s been more than 6 months since your last library visit, can you tell me more about why that is?” and “What are your thoughts on the value of physical libraries in our digitally driven 2017 world and why?”. I threw that last one in out of curiosity and also because my research into one of the book tasks (Task 1: “This book is overdue!: how librarians and cybrarians can save us all” by Marilyn Johnson) had inspired me. The two new questions were set as multi-line text answer responses to give participants as much or as little space as they needed to share their thoughts.
Building the study
Pulling it all together in Treejack was a fairly easy task because I had done most of the heavy lifting during my prep work. The LCC system’s mega IA was an easy copy and paste into the tool where I double checked my work to make sure there were no stray branches or call numbers that I had forgotten to delete. Sure enough there were one or two in need of a quick edit but it’s an easy fix that can be done within Treejack.
Next, I set the correct answers to my tasks which was easy in this case because there was only one possible answer for each task and I set up my post-study questionnaire. Lastly, I made some slight tweaks to the default messaging provided by Treejack. There’s nothing wrong with it — it just referenced the goal of the study as ‘improving a website’ which in this case wasn’t true and could potentially cause some confusion among participants. I didn’t tell them exactly what I was doing but I did say this:
“Welcome to this Treejack study, and thank you for agreeing to participate!
The activity shouldn’t take longer than 10 to 15 minutes to complete.
Find out how on the next page…”
For this closed card sort study on the LCC system, I used the Optimal Workshop recruitment service which can be accessed via the Recruitment tab in Treejack. I find the service to be a quick, easy and reliable way to gain completed participant responses from my user group in a matter of hours. I’m able to specify age, gender, location and more and it allows me to reach users that I wouldn’t be able to recruit myself. The recruitment brief for this study was 50 participants with an equal mix of genders all residing in the United States.
My Treejack tree test study on the LCC system had 63 participants in total — 50 completed the study and 13 abandoned it. Abandonments are recorded when participants close the tab or window of the study without submitting it and can happen for a number of reasons. Sometimes people will open it and then close it down to come back to it later! High abandonment rates (approximately upwards of 25%) can indicate that the study may have had too many tasks or the IA was too big or had too many ambiguous labels and people may have just given up. This study had a 21% abandonment rate but given the size of this IA, it’s still within reasonable limits and is actually a little bit lower than I expected so I’m pretty happy!
Before jumping into results analysis for a tree test, I like to take a quick look at the Tasks results visualization in the Overview tab (see below). For this study, it’s a little scary looking with all that red but that green spike up the middle and the large amounts of light red (indicating indirect failure, when participants got it wrong and followed the long and winding road to failure) piqued my curiosity and has me feeling the buzz of discovery joy, so let’s take a look!
Task 6 Results
Let’s start with something positive and look at that green streak of delight up the middle (Task 6). Looking at the results overview diagram (see above) for Task 6 in this study, 64% of participants were able to correctly locate “On Origin of the Species” by Charles Darwin. 34% of participants nominated the incorrect location and 1 person (2%) skipped the task. It’s the highest overall scoring task in this study but an overall score of 4 isn’t generally something to be proud of so we’re going to take a closer look there. The overall score is an aggregate score designed to give you a general impression of what happened with success being the highest weighted contributing factor — you can learn more about how overall scores are calculated in Optimal Workshop’s Knowledge Base. The median time taken to complete this task was 21.15 seconds; considering the size of the IA, that isn’t too bad. If we take a look at the pietree for this task (see below) while there is some evidence of scattered clicking around on the right hand side of the visualization, ‘Science’ was the second most popular branch. ‘Home’ was the first popular branch of the tree clicked on in this task, which is great because it’s correct!
If I zoom in and hover — so you can actually see it (below) — ‘Science’ was clicked on 68 times and only 3 of those clicks resulted in pathways where participants turned around and went back to ‘Home’.
Because we’re talking about a library classification system and not a website IA, ‘Home’ for this study was used as the neutral starting point for my tree. In the case of a website IA, the number of times ‘Home’ is clicked can help build a richer picture of what went on during the study. A high number — say 3 times the number of participants or more — can sometimes indicate that people were lost in the structure and clicked on ‘Home’ a lot as a way of starting over. Even though this IA doesn’t really have a ‘Home’, clicks recorded on it are still valid — think of it as walking away from the shelves and starting again.
From ‘Science’, 34 clicks landed on ‘Biology’, 17 on ‘Natural History’ and 9 on ‘Science (General) – many of my participants were in the right area just not the right shelf. Studies have shown that if users correctly land that first click, they are more likely to successfully complete their task and if I look at the Treejack results data under the First Click tab, I can see that 74% of my participants visited ‘Science’ first. They started their journey on the right foot but didn’t quite make it to the end. I’m wondering if there is more here to explore.
Task 7 produced some very interesting results. 32% of participants were able to successfully locate Henry Gray’s well known human anatomy textbook “Gray’s Anatomy” while 66% failed and 2% (1 participant) skipped the task. While that’s great, the real fun lies in the fact that the most popular first click location for participants for this task was ‘Medicine’ (52%) but the book actually lives under ‘Science’. I can see how that might be plausible — human anatomy is studied/understood in depth by medical professions so therefore ‘Medicine’ right?! When I wrote this task, I knew the correct answer and I was very careful not to say the ‘m’ word because I was concerned it might lead my participants. Given that 32% were able to find it, I don’t think a lack of context was an issue but would need to talk to users in the library environment to completely rule it out. I’d need to be able to ask them why ‘Medicine’ and not something else. Again, I’m not saying the results are useless; they’re a starting point for deeper exploration into the library itself.
Other findings from this study:
- While ‘Political Science’ was visited 84 times during Task 4, only 4% of participants (2 people) were able to correctly locate “The Prince” by Niccolo Machiavelli. A further 37 other participants nominated answers in the correct section but just went to the wrong shelf.
- No participant in this study was able to locate “Steal Like an Artist: 10 Things Nobody Told You About Being Creative” by Austin Kleon and there was very little agreement around where they expected to find it.
- No participant was able to find “The Varieties of Religious Experience: A Study in Human Nature by William James”. All participants did identify the correct LCC system class (level 2 in the IA) ‘Philosophy, Psychology, Religion’ and 36 people correctly chose ‘Religion’ as the LCC subclass (level 3 in the IA).
- Just 1 participant was able to identify the correct LCC system location for “This book is overdue!: how librarians and cybrarians can save us all” by Marilyn Johnson and responses were dispersed across 11 different LCC system classes — remember there are 21 of them in total!
- 36% of participants were able to find Lynne Truss’ “Eats, Shoots and Leaves” and among those that did not find it, there were very low levels of agreement recorded which was made clear by the large, scattered pietree.
- Only 1 participant was able to find “The Death and Life of Great American Cities” by Jane Jacobs
- Lastly, no one was able to find Maya Angelou’s “I Know Why the Caged Bird Sings” . There were 28 participants who reached the correct LCC system sub-class of ‘American Literature’ and one person reached the correct location and turned around and went back.
If you want to take a peek into the results of this Treejack study for yourself, I’ve made them publicly available via a handy shareable link generated under the ‘Sharing’ tab in the tool. The Optimal Workshop suite provide a wide range of sharing permissions for their powerful results analysis visualizations which is especially useful for communicating your results to stakeholders. Can also serve as a demo if you’re trying to get them on board and support your user research activities!
Post-study question results
The largest group of participants were the ones that hadn’t visited a library in more than a year and that number came in at 44%. Answering the question of “If it’s been more than 6 months since your last library visit, can you tell me more about why that is?” was not compulsory but I still managed to get a pretty good turn out with 34 participants responding. The responses were mixed and provided some insight into how they use the library and how they view its service offering:
- 9 participants said they didn’t need the library because they have internet access at home/work/on their phones.
- 5 participants stated a preference for eBooks and 2 were actually accessing eBooks via their local library like Participant 31: I’ve been reading more e-books and I can borrow those from the library’s website. I had been going weekly until I got my Kindle.
- 4 participants didn’t have access to a library or weren’t really a fan of their local library like Participant 51: The library I prefer, which is larger than my local one, is much more enjoyable to visit and I’m able to get the books I wanted immediately instead of having them transferred after several days or weeks to my closest library. I didn’t like to wait for the books anymore and didn’t find it beneficial to drive 15 to 20 minutes to the library I preferred anymore.
- Lastly, Participant 45 said something I can certainly relate to: I think the library is an intimidating place. It overwhelms me. I would likely need to ask for help finding a book but I hate asking for help. And I worry they won’t have what I want anyway.
The last question that I asked my participants in this study on the LCC system was: “What are your thoughts on the value of physical libraries in our digitally driven 2017 world and why?” As mentioned earlier, I asked this question because I was inspired after researching the book that appeared in one of the testing tasks for this study (Task 1). As researchers, we’re always telling people “I know what I think but I want to know what you think”. In this case, I think that libraries will continue to evolve and adapt to suit the world we live in. I don’t see the library as a building full of a catalogue of books; I see it as a place to access information that was organized in a specific way — it’s open, flexible and fluid. It could be anything it wants to be! I didn’t know what to expect but I was delighted to find that 39 of my participants felt the same! Here are some of my favorite responses:
“I still think libraries are relevant and necessary. Sometimes I still have to go to the library and look through the books for information, what I have access to online is sometimes not complete. Libraries are great for kids to discover new books and offer an affordable way to read books for many children and adults. Libraries have special programs and are a good place for the community to gather.”
“I am the library’s biggest fan. I love my library and take my kids there all the time. I encourage them to check out books, to explore topics they might have thought they were not interested in. Libraries are a wonderful part of our community.”
“I think they’re valuable since they give people a repository to visit to find information, without needing to have access to the Internet. I think it democratizes knowledge somewhat.”
Next steps for a study like this one
The thing that stood out to me the most in this study was the fact that for more than half of the tasks in this study, many participants went directly to the correct LCC system class (and in some cases the correct subclass too) but failed to find the right final location of the book. They were in the right area of the library but couldn’t find the right shelf. As mentioned before, the LCC system is an IA in its own right but it’s also part of a complex environment with other infrastructure and I can’t help but wonder about the role those other parts play in a contextual library environment.
They were in the right area but what would they do when they realized they had the wrong shelf? Ask a librarian? Check the signs? Or would they even realize they were in the wrong place and assume the book had been checked out? I honestly don’t know and while this study provides an excellent starting point, more research is needed to uncover the rest of the story.
Overall the numbers on this one were pretty dismal — the failure rate was high, there were people going to ‘Medicine’ when they should have been going to ‘Science’ and there were instances of people reaching the correct answer and then turning back. This is an IA that potentially relies heavily on its supporting infrastructure but also requires its users to have some understanding of what they’re looking for. Some of these books were filed in the depths of the IA and ultimately under labels that meant you needed to know the publication year or the nationality of the author!
Next steps from here would be to head to an actual library and run an in person tree test -— with a twist. You could conduct the research one on one and present participants with card sized pieces of paper that have the same type of scenarios used in this study printed on them. Then, you would ask your participants to work through the scenario in the library environment while you shadow them — the goal being to find the actual book or at the very least the shelf (in case it’s checked out!). Ask your participants questions along the way, observe their actions and get them to talk you through their thinking. Keep the scenarios to a maximum of 4- 5 and try to test with at least 5 people. Finish each scenario with a quick discussion with your participant to ask any questions that haven’t yet been answered.
5 key IA lessons we can all take away from this study
The LCC system is over 100 years old and, like the DDC system, has endured as one of the most commonly used library classification systems in the world. The results of this one tree test aren’t going to change it but that’s not to say there isn’t anything to be learned from this study beyond satisfying my curiosity. Here are some key IA lessons we can all learn from this one and apply to our work:
- IAs don’t exist in isolation and while they are the structural backbone of the system, they need to be researched in conjunction with their complementary parts.
- Just like with the DDC system, remote user research studies definitely add value but should be run in conjunction with face-to-face moderated user research to gain context, ask questions and understand user needs, wants, behaviors and goals.
- Mega IAs are not necessarily the enemy but labels need to be thoroughly tested with users to ensure they meet both taxonomic and ontological expectations.
- Non-digital IAs can be researched using the same tools and methods you would to research digital ones by slightly adapting your approach and ensuring a balance between remote and face-to-face studies.
- Sometimes a tree test leaves you with more questions than answers but this isn’t a bad thing because it tells you where to look next.