The information architecture of libraries: Library of Congress Classification

Originally designed in the early 20th century to specifically organize the collection held by the Library of Congress in the US, the LCC system has 21 classes (categories) that each contain several layers of detailed subclasses allowing for deeper classification granularity which is why itās better for larger collections. It has more call numbers overall than DDC meaning more works can be organized in it.
From an information architecture (IA) size perspective, this means the LCC system is 21 categories wide and up to 7 levels deep in some parts. Itās one of the largest IAs Iāve ever come across and one of the oldest! So, pull up a chair and I will tell you the story of what happened when the mega IA that is the LCC system met Treejack.
Getting started
IA: Library of Congress Classification (LCC) system
Research scope: The LCC system
Not in scope: The physical library environment
Tools: Treejack
For the second part of this study, I decided to run a tree test using Optimal Workshopās Treejack. Treejack is our online tree testing tool (also known as reverse card sorting) that assesses the findability of content. I wanted to test the whole LCC IA but there was no way it was going to lend itself to a card sort ā weāre talking thousands of cards here (4,849 to be exact)! If youāre planning to card sort a mega IA, itās best to break it into smaller chunks and research specific sections at a time (a viable card sort has between approximately 30- 60 cards). So, with a card sort out of the question I decided to run a tree test. A tree test works like a card sort but in reverse. Instead of asking participants to organize the cards into groups that make sense to them (and give those groups names in the case of an open card sort), you ask them to find a specific piece of content within the hierarchical tree-like structure of the IA. In a IA design process, a tree test logically follows the creation of an IA that was built based on card sort insights and I thought running a tree test on the LCC system would nicely complement the DDC systemās card sort.
Preparing for the study
Building the tree
Before heading into Treejack, itās a good idea to build the tree of your IA out in a spreadsheet, especially when youāre dealing with a really big one like I was! When building and testing a large IA, itās really easy to get lost and make silly mistakes like accidentally putting labels at the wrong level of the structure but you can avoid this by using a spreadsheet. Itās a lot easier to read, it provides a clear sense of hierarchy and you can highlight your level one headings in a contrasting colour like I did in the image below to help you navigate your IA and stay on track.
From left to right, each column of your spreadsheet represents a level in your IA and you stagger the labels based on their hierarchical relationships within the structure like in the example from this study below:

Building the LCC system IA tree for testing was no mean feat. While the Library of Congressā website certainly made the structural copying and pasting side of things easy for me, at 4,861 spreadsheet rows deep, it was also very easy to get lost. Going in, I knew it was a big IA but I grossly underestimated the complexity of it. The handy PDFs I found on the Library of Congressā website outlining the entire LCC system IA were more than 30 pages long for some of the classes (categories) and I spent a lot of time holding a ruler up to my screen to see which level in the structure I was up to. The PDFs also showed the call number ranges for each label within the IA and I needed to remove these as I was concerned they might be distracting or confusing for my participants. To do this as quickly as possible, I copied and pasted the labels into Notepad to remove the formatting, deleted the call number and then copied and pasted what was left into my spreadsheet. Sometimes I was lucky and would find a huge patch of labels all the same level which could be moved in bulk and other times it was quite fiddly with a lot of back and forth across the levels. Not for the faint hearted but completely achievable with some creative thinking. I started color coding the Level 1 rows in the spreadsheet so I could see where I was and I found that really helped me stay on top of it! (see below).

Choosing the task based questions
Tree testing requires participants to work through a series of tasks to locate specific content and the maximum reasonable number of tasks per study is 10, otherwise it just takes too long and people give up! To me 10 seemed kind of perfect in this case! I randomly chose one title from each DDC system class and put them all into a spreadsheet to give me some space to design the wording of the tree test tasks.
From looking at the IA, it was clear to me that participants would need some context in order to have a fair shot at locating each book. Due to the level of granularity in the subclasses and the labels themselves, I couldnāt just tell my participants āgo find this bookā and actually expect them to find it based on the title alone. In the real world when you go to a library, there are: signs, call numbers, electronic catalogue systems ready for you to look up anything from a subject keyword to the actual title and of course librarians. Itās not fair ā or useful ā to test a system so reliant on other infrastructure without providing sufficient context to participants.
I researched each of the ten books (some of them I have actually read) to determine their key themes and messages. I looked into why a person might seek these titles out. Are they doing a school project? If so, which class is it for? Which level of education? What is the goal of this fictitious project/report/essay? Why this book and not others?
I also needed to identify the correct location for each of these books within the LCC system and looked there for inspiration for contextual cues to include in my tasks. For example, āGrayās Anatomyā by Henry Gray lives under āScienceā rather than āMedicineā and it would be misleading for me to create a scenario around a medical student looking for a textbook. You certainly donāt want to lead your participants directly to the right answer but itās also unethical to deliberately lead them away ā if you want usable data, donāt tank the results from the beginning.
With all this in mind, here are the 10 book finding based tasks I designed for this Treejack study on the LCC system:
- You’re doing some research into the role of librarians in the digital age and have heard that āThis book is overdue!: how librarians and cybrarians can save us allā by Marilyn Johnson is a must-read. Where would you expect to find this book?
- You work for a government department and lead a team of 15 people. A team planning day is coming up and you’re researching ways to help your team think more creatively. A friend recommended you try āSteal Like an Artist: 10 Things Nobody Told You About Being Creativeā by Austin Kleon. Where would you expect to find this book?
- You’re writing an essay into the human experience side of religion and want to read The Varieties of Religious Experience: A Study in Human Nature by William James. Where would you expect to find this book in the library?
- A favorite TV show of yours recently explored the political science concept of Machiavellianism and you’re curious. A little Googling has led you to The Prince by Niccolo Machiavelli and you like to borrow it from your university’s library. Where would you expect to find this book?
- You’re a high school student and your English Literature teacher has recommended you read āEats, Shoots and Leavesā by Lynne Truss to help build your grammatical skills. Where would you expect to find this book?
- You’re visiting the library with a friend who is working on a science project on evolution and they need your help to locate āOn the Origin of Speciesā by Charles Darwin. Where would you expect to find this book?
- Your teenage nephew is working on a science project for school on the human digestive system. His teacher has provided a recommended reading list and āGray’s Anatomyā by Henry Gray is right at the top of the list and he need your help finding it. Where would you go to look for this book?
- You’re studying industrial design at university and one of your elective classes is on urban design. You lecturer has recommended you read āThe Death and Life of Great American Citiesā by Jane Jacobs to learn more about the social role of sidewalks in a city. Where would you go to find this book?
- You’re doing some research into women in literature in the early twentieth century and your aunt, who is a university lecturer, has suggested you read āA Room of One’s Ownā by Virginia Woolf. Where would you go to find this book?
- You’re a university student and for your American Literature class, you need to read āI Know Why the Caged Bird Singsā by Maya Angelou for an essay based assignment about how the author’s love of literature helped her overcome and stand up to racism and trauma. The teacher ran out of copies, so you have to get your own from the library. Where would you expect to find it?
Post-study questions
With the Optimal Workshop suite of tools, you can add post-study questions that appear at the end of the study after participants have completed the activity. Itās no substitute for talking to users in person, but I sometimes include them to gain additional context. I’d made the mistake of not asking enough questions before and this study gave me a chance to learn from that and try again. Once again, I asked my participants the multichoice radio button response question of: āWhen was the last time you visited a library?ā but this time followed it with āIf it’s been more than 6 months since your last library visit, can you tell me more about why that is?ā and āWhat are your thoughts on the value of physical libraries in our digitally driven 2017 world and why?ā. I threw that last one in out of curiosity and also because my research into one of the book tasks (Task 1: āThis book is overdue!: how librarians and cybrarians can save us allā by Marilyn Johnson) had inspired me. The two new questions were set as multi-line text answer responses to give participants as much or as little space as they needed to share their thoughts.
Building the study
Pulling it all together in Treejack was a fairly easy task because I had done most of the heavy lifting during my prep work. The LCC systemās mega IA was an easy copy and paste into the tool where I double checked my work to make sure there were no stray branches or call numbers that I had forgotten to delete. Sure enough there were one or two in need of a quick edit but itās an easy fix that can be done within Treejack.
Next, I set the correct answers to my tasks which was easy in this case because there was only one possible answer for each task and I set up my post-study questionnaire. Lastly, I made some slight tweaks to the default messaging provided by Treejack. Thereās nothing wrong with it ā it just referenced the goal of the study as āimproving a websiteā which in this case wasnāt true and could potentially cause some confusion among participants. I didnāt tell them exactly what I was doing but I did say this:
“Welcome to this Treejack study, and thank you for agreeing to participate!
The activity shouldn’t take longer than 10 to 15 minutes to complete.
Find out how on the next page…”
Participant recruitment
For this closed card sort study on the LCC system, I used the Optimal Workshop recruitment service which can be accessed via the Recruitment tab in Treejack. I find the service to be a quick, easy and reliable way to gain completed participant responses from my user group in a matter of hours. Iām able to specify age, gender, location and more and it allows me to reach users that I wouldnāt be able to recruit myself. The recruitment brief for this study was 50 participants with an equal mix of genders all residing in the United States.
Results
Overview
My Treejack tree test study on the LCC system had 63 participants in total ā 50 completed the study and 13 abandoned it. Abandonments are recorded when participants close the tab or window of the study without submitting it and can happen for a number of reasons. Sometimes people will open it and then close it down to come back to it later! High abandonment rates (approximately upwards of 25%) can indicate that the study may have had too many tasks or the IA was too big or had too many ambiguous labels and people may have just given up. This study had a 21% abandonment rate but given the size of this IA, itās still within reasonable limits and is actually a little bit lower than I expected so Iām pretty happy!
Before jumping into results analysis for a tree test, I like to take a quick look at the Tasks results visualization in the Overview tab (see below). For this study, itās a little scary looking with all that red but that green spike up the middle and the large amounts of light red (indicating indirect failure, when participants got it wrong and followed the long and winding road to failure) piqued my curiosity and has me feeling the buzz of discovery joy, so letās take a look!

Task 6 Results

Letās start with something positive and look at that green streak of delight up the middle (Task 6). Looking at the results overview diagram (see above) for Task 6 in this study, 64% of participants were able to correctly locate āOn Origin of the Speciesā by Charles Darwin. 34% of participants nominated the incorrect location and 1 person (2%) skipped the task. Itās the highest overall scoring task in this study but an overall score of 4 isnāt generally something to be proud of so weāre going to take a closer look there. The overall score is an aggregate score designed to give you a general impression of what happened with success being the highest weighted contributing factor ā you can learn more about how overall scores are calculated in Optimal Workshopās Knowledge Base. The median time taken to complete this task was 21.15 seconds; considering the size of the IA, that isnāt too bad. If we take a look at the pietree for this task (see below) while there is some evidence of scattered clicking around on the right hand side of the visualization, āScienceā was the second most popular branch. āHomeā was the first popular branch of the tree clicked on in this task, which is great because itās correct!

If I zoom in and hover ā so you can actually see it (below) ā āScienceā was clicked on 68 times and only 3 of those clicks resulted in pathways where participants turned around and went back to āHomeā.

Because weāre talking about a library classification system and not a website IA, āHomeā for this study was used as the neutral starting point for my tree. In the case of a website IA, the number of times āHomeā is clicked can help build a richer picture of what went on during the study. A high number ā say 3 times the number of participants or more ā can sometimes indicate that people were lost in the structure and clicked on āHomeā a lot as a way of starting over. Even though this IA doesnāt really have a āHomeā, clicks recorded on it are still valid ā think of it as walking away from the shelves and starting again.
From āScienceā, 34 clicks landed on āBiologyā, 17 on āNatural Historyā and 9 on āScience (General) – many of my participants were in the right area just not the right shelf. Studies have shown that if users correctly land that first click, they are more likely to successfully complete their task and if I look at the Treejack results data under the First Click tab, I can see that 74% of my participants visited āScienceā first. They started their journey on the right foot but didnāt quite make it to the end. Iām wondering if there is more here to explore.
Task 7

Task 7 produced some very interesting results. 32% of participants were able to successfully locate Henry Grayās well known human anatomy textbook āGrayās Anatomyā while 66% failed and 2% (1 participant) skipped the task. While thatās great, the real fun lies in the fact that the most popular first click location for participants for this task was āMedicineā (52%) but the book actually lives under āScienceā. I can see how that might be plausible ā human anatomy is studied/understood in depth by medical professions so therefore āMedicineā right?! When I wrote this task, I knew the correct answer and I was very careful not to say the āmā word because I was concerned it might lead my participants. Given that 32% were able to find it, I donāt think a lack of context was an issue but would need to talk to users in the library environment to completely rule it out. Iād need to be able to ask them why āMedicineā and not something else. Again, Iām not saying the results are useless; theyāre a starting point for deeper exploration into the library itself.
Other findings from this study:
- While āPolitical Scienceā was visited 84 times during Task 4, only 4% of participants (2 people) were able to correctly locate āThe Princeā by Niccolo Machiavelli. A further 37 other participants nominated answers in the correct section but just went to the wrong shelf.
- No participant in this study was able to locate āSteal Like an Artist: 10 Things Nobody Told You About Being Creativeā by Austin Kleon and there was very little agreement around where they expected to find it.
- No participant was able to find āThe Varieties of Religious Experience: A Study in Human Nature by William Jamesā. All participants did identify the correct LCC system class (level 2 in the IA) āPhilosophy, Psychology, Religionā and 36 people correctly chose āReligionā as the LCC subclass (level 3 in the IA).
- Just 1 participant was able to identify the correct LCC system location for āThis book is overdue!: how librarians and cybrarians can save us allā by Marilyn Johnson and responses were dispersed across 11 different LCC system classes ā remember there are 21 of them in total!
- 36% of participants were able to find Lynne Trussā āEats, Shoots and Leavesā and among those that did not find it, there were very low levels of agreement recorded which was made clear by the large, scattered pietree.
- Only 1 participant was able to find āThe Death and Life of Great American Citiesā by Jane Jacobs
- Lastly, no one was able to find Maya Angelouās āI Know Why the Caged Bird Singsā . There were 28 participants who reached the correct LCC system sub-class of āAmerican Literatureā and one person reached the correct location and turned around and went back.
If you want to take a peek into the results of this Treejack study for yourself, Iāve made them publicly available via a handy shareable link generated under the āSharingā tab in the tool. The Optimal Workshop suite provide a wide range of sharing permissions for their powerful results analysis visualizations which is especially useful for communicating your results to stakeholders. Can also serve as a demo if youāre trying to get them on board and support your user research activities!
Post-study question results
The largest group of participants were the ones that hadnāt visited a library in more than a year and that number came in at 44%. Answering the question of āIf it’s been more than 6 months since your last library visit, can you tell me more about why that is?ā was not compulsory but I still managed to get a pretty good turn out with 34 participants responding. The responses were mixed and provided some insight into how they use the library and how they view its service offering:
- 9 participants said they didnāt need the library because they have internet access at home/work/on their phones.
- 5 participants stated a preference for eBooks and 2 were actually accessing eBooks via their local library like Participant 31: I’ve been reading more e-books and I can borrow those from the library’s website. I had been going weekly until I got my Kindle.
- 4 participants didnāt have access to a library or werenāt really a fan of their local library like Participant 51: The library I prefer, which is larger than my local one, is much more enjoyable to visit and I’m able to get the books I wanted immediately instead of having them transferred after several days or weeks to my closest library. I didn’t like to wait for the books anymore and didn’t find it beneficial to drive 15 to 20 minutes to the library I preferred anymore.
- Lastly, Participant 45 said something I can certainly relate to: I think the library is an intimidating place. It overwhelms me. I would likely need to ask for help finding a book but I hate asking for help. And I worry they won’t have what I want anyway.
The last question that I asked my participants in this study on the LCC system was: āWhat are your thoughts on the value of physical libraries in our digitally driven 2017 world and why?ā As mentioned earlier, I asked this question because I was inspired after researching the book that appeared in one of the testing tasks for this study (Task 1). As researchers, weāre always telling people āI know what I think but I want to know what you thinkā. In this case, I think that libraries will continue to evolve and adapt to suit the world we live in. I donāt see the library as a building full of a catalogue of books; I see it as a place to access information that was organized in a specific way ā itās open, flexible and fluid. It could be anything it wants to be! I didnāt know what to expect but I was delighted to find that 39 of my participants felt the same! Here are some of my favorite responses:
“I still think libraries are relevant and necessary. Sometimes I still have to go to the library and look through the books for information, what I have access to online is sometimes not complete. Libraries are great for kids to discover new books and offer an affordable way to read books for many children and adults. Libraries have special programs and are a good place for the community to gather.”
“I am the library’s biggest fan. I love my library and take my kids there all the time. I encourage them to check out books, to explore topics they might have thought they were not interested in. Libraries are a wonderful part of our community.”
“I think they’re valuable since they give people a repository to visit to find information, without needing to have access to the Internet. I think it democratizes knowledge somewhat.”
Next steps for a study like this one
The thing that stood out to me the most in this study was the fact that for more than half of the tasks in this study, many participants went directly to the correct LCC system class (and in some cases the correct subclass too) but failed to find the right final location of the book. They were in the right area of the library but couldnāt find the right shelf. As mentioned before, the LCC system is an IA in its own right but itās also part of a complex environment with other infrastructure and I canāt help but wonder about the role those other parts play in a contextual library environment.
They were in the right area but what would they do when they realized they had the wrong shelf? Ask a librarian? Check the signs? Or would they even realize they were in the wrong place and assume the book had been checked out? I honestly donāt know and while this study provides an excellent starting point, more research is needed to uncover the rest of the story.
Overall the numbers on this one were pretty dismal ā the failure rate was high, there were people going to āMedicineā when they should have been going to āScienceā and there were instances of people reaching the correct answer and then turning back. This is an IA that potentially relies heavily on its supporting infrastructure but also requires its users to have some understanding of what theyāre looking for. Some of these books were filed in the depths of the IA and ultimately under labels that meant you needed to know the publication year or the nationality of the author!
Next steps from here would be to head to an actual library and run an in person tree test -ā with a twist. You could conduct the research one on one and present participants with card sized pieces of paper that have the same type of scenarios used in this study printed on them. Then, you would ask your participants to work through the scenario in the library environment while you shadow them ā the goal being to find the actual book or at the very least the shelf (in case itās checked out!). Ask your participants questions along the way, observe their actions and get them to talk you through their thinking. Keep the scenarios to a maximum of 4- 5 and try to test with at least 5 people. Finish each scenario with a quick discussion with your participant to ask any questions that havenāt yet been answered.
5 key IA lessons we can all take away from this study
The LCC system is over 100 years old and, like the DDC system, has endured as one of the most commonly used library classification systems in the world. The results of this one tree test arenāt going to change it but thatās not to say there isnāt anything to be learned from this study beyond satisfying my curiosity. Here are some key IA lessons we can all learn from this one and apply to our work:
- IAs donāt exist in isolation and while they are the structural backbone of the system, they need to be researched in conjunction with their complementary parts.
- Just like with the DDC system, remote user research studies definitely add value but should be run in conjunction with face-to-face moderated user research to gain context, ask questions and understand user needs, wants, behaviors and goals.
- Mega IAs are not necessarily the enemy but labels need to be thoroughly tested with users to ensure they meet both taxonomic and ontological expectations.
- Non-digital IAs can be researched using the same tools and methods you would to research digital ones by slightly adapting your approach and ensuring a balance between remote and face-to-face studies.
- Sometimes a tree test leaves you with more questions than answers but this isnāt a bad thing because it tells you where to look next.