Meeting corpus users’ needs

Symbolic picture for the article. The link opens the image in a large view.

17. January 2025

Meeting corpus users’ needs

Author: Yukio Tono (Tokyo University of Foreign Studies)
Published: 17 January 2025

As language tools evolve in the digital age, having the right tools can make all the difference in how we interact with and analyze language. But what makes a tool truly effective? The answer lies in understanding user needs and designing tools that meet those needs effectively.

This blog post explores how different users approach concordance tools—applications that help us examine language patterns in context. By understanding these diverse needs, we can create better tools that serve their intended purposes more efficiently.

Researchers

Let me tell you about one of the most important groups using corpus tools—researchers! These dedicated scholars dive into corpora to answer fascinating questions about language. If you’re a researcher just starting your corpus journey, you might be wondering “Which corpus would be perfect for my research?” or “How can I best use corpus tools to find exactly what I’m looking for?”

The good news is that most researchers are eager to explore the wonderful world of corpus tools. They’re ready to master essential features like concordance searches (finding specific words in context), collocation analysis (discovering which words frequently appear together), and creating word lists or examining keywords. These tools are like a researcher’s Swiss Army knife—each one serves a unique and valuable purpose!

Sometimes, researchers need to dig even deeper. They might want to know more about the texts they’re studying—who wrote them? When? Under what circumstances? This kind of information helps them control variables that might affect their results. Think of it as making sure you’re comparing apples to apples in your research.

At the end of the day, success for researchers comes down to two main things: getting comfortable with these powerful tools through training and having an interface that makes it easy to access all this valuable information, both within and beyond the corpus itself.

Teachers

Let us talk about language teachers—they are another important group who use corpus tools, but in a different way from researchers. Teachers are usually looking for practical answers to help with their day-to-day teaching. They might ask things like “Where can I find some good examples to show this grammar pattern?” or “Should I teach these new words to my students?” or even “Which way of saying this is more common?”

Regular corpus tools do not always make it easy to answer these questions directly. Why? Well, finding the right examples for students is not just about getting any sentence from the corpus—teachers need to think about whether their students will understand the vocabulary and if they have learned the grammar patterns used in those examples. It is quite a puzzle!

That is why we need special tools made just for teachers. Two helpful ones are the CEFR Vocabulary Level Analyzer (CVLA3.0) and the English Level Checker. These tools can help teachers figure out how difficult a text is and what grammar patterns it contains—much more useful for teaching than a regular corpus tool!

In this sense, corpus tools need to be redesigned in order to meet specific needs of language teachers and for the analysis of pedagogic corpora such as textbook corpora, classroom corpora, or learner corpora. Then the interface also needs to evolve out of specific needs of those teachers, task designers, and materials developers.

Learners

Now, let us explore how different types of learners can use corpus tools effectively! Let me walk you through some practical examples:

Advanced learners, like PhD students writing their dissertations, often use corpus tools independently. They might use n-gram search for specific academic phrases like “the results suggest that” or “previous studies have shown” to improve their academic writing. These learners can handle complex concordance searches and understand sophisticated vocabulary in context.

For intermediate learners (with 3,000–5,000 words), imagine a student looking up the word “impact.” They might not know every word in the concordance lines, but they can understand enough context to learn how “impact” is used differently from “effect” or “influence.” They can see patterns like “have an impact on” or “the impact of X on Y” and start using these in their own writing.

Lower-intermediate learners (CEFR A2) need more support. Here is how we help them: When they look up a word like “consider” in a concordancer, they can see the English sentence on top and its translation below. For example: “We need to consider the environmental impact” would appear with its translation, making it easier to understand. They can also click on unfamiliar words like “environmental” to see instant dictionary definitions through pop-up features.

For beginners (CEFR A1), we take a different approach. Instead of overwhelming them with authentic texts, we create specially designed materials. For instance, if they are learning about daily routines, they might see simple sentences like “I wake up at 7:00” or “She goes to school by bus” collected from example database specifically made for this purpose. In my CJEC project, we ensure that surrounding words in these examples stay within their 1,000-word vocabulary level. We also check that grammar patterns match their learning stage—for example, using simple present tense before introducing more complex tenses.

This way, each group of learners can benefit from corpus tools in ways that match their language abilities and learning needs. It is all about providing the right level of support at the right time!

What is RC21 is for

Reading concordances offers unique insights into language use across different contexts and user groups, as we’ve explored with researchers, teachers, and learners. Each group brings distinct needs and skills: researchers seek detailed analysis capabilities, teachers need pedagogically appropriate tools, and learners require support matched to their proficiency levels. RC21 aims to evolve concordancing to better serve these diverse users, making corpus tools more accessible and effective for everyone. Through this project, we are working to transform how people interact with and learn from language data.