Expanding Concordance Lines to Aid Interpretability

Symbolic picture for the article. The link opens the image in a large view.

5. December 2024

Expanding Concordance Lines to Aid Interpretability

Author: Mathew Gillings
Published: 05 December 2024

Back in June 2024, I had the pleasure of joining and speaking at the RC21 workshop at the ICAME conference in Vigo, Spain. The workshop was designed to stimulate discussion on concordance analysis, specifically on the hermeneutics of concordancing, how it is used by different fields, and what the future may hold for the technique in an age where specialist concordancers are more advanced than ever.

Mathew Gillings presenting his talk. Photo: Stephanie Evert.

Expanding Concordance Lines to Aid Interpretability

For my own part in proceedings, I wanted to return to one of the most fundamental pieces of advice in corpus linguistics and put it to the test to see whether it really works in practice: does ‘expanding the concordance line’ actually aid interpretability?

The impetus for the talk came from some earlier work conducted with my colleague Gerlinde Mautner, now published in the International Journal of Corpus Linguistics. In that paper [1], we proceeded in two stages: (1) we looked through 800 concordance lines and identified 8 key issues hindering interpretability; then (2) quantified those 8 issues in a smaller sample of 200 concordance lines. Lines were considered ‘uninterpretable’ if we were unable to answer one of three research questions by simply viewing the line (and not expanding it). Almost a quarter of the lines we sampled (45/200) suffered from one of these eight issues. Those issues, and the number of times they appeared, are detailed below.

The obvious next step, then, was to return to those same 45 uninterpretable lines and ask: In order to improve interpretability, do I need to expand the concordance line, or revisit the full text? The answer to that question is below too:

	Issue Description	Uninterpretable Concordance Lines	Expanding is Sufficient	Must Revisit Full Text
1	Noise in the corpus: non-words, tags, and random strings of words	0 (only identified in stage 1)	N/A	N/A
2	Non-standard syntax: coding sheets, questionnaires, mathematical formulae	1	0	1
3	Referring expressions pointing to a lexical item outside the available co-text	7	7	0
4	Technical terms/jargon	20	17	3
5	Acronyms and initialisms	5	1	4
6	Unspecific co-text: impossible to deduce meaning	4	1	3
7	Unclear quotation source attribution	2	2	0
8	Lines unrelated to research question	6	N/A	N/A

Here we can see that, in most cases, expanding the concordance line was sufficient to improve interpretability. This was especially the case when the issue concerned a referring expression pointing to an item outside the immediate co-text (expanding the line allowed me to see what it or they referred to); when there was an unclear technical term or piece of jargon (and I could figure out meaning based on context); or when there was an unclear quotation source attribution (and expanding the line allowed me to benefit from knowing who said what). On the other hand, revisiting the full text in its entirety was necessary to solve other issues. This was the case in order to properly make sense of non-standard syntax (e.g., when the concordance line was filled with mathematical formulae); when acronyms or initialisms were used (as they were often defined the first time they were used in a text, rather than every time); and when the co-text was so unspecific I was unable to make sense of it (these tended to be instances where the writing was so heavy, such as deep theoretical discussion, and more information was needed from elsewhere in the text).

The long and short of it is that expanding the concordance line was sufficient to aid interpretability in 28 cases, whereas I had to return to the full text in 11. With reference to our earlier finding, this is for sure less intimidating than being told almost a quarter of all lines are uninterpretable!

Expanding concordance lines to facilitate (im)politeness judgements

I was also able to share some other findings from another short experiment, this time aimed at those working in corpus pragmatics. This was part of some ongoing work with Jonathan Culpeper and Isolde van Dorst on conventionalised (im)politeness[2], and this time I looked at whether only expanding the concordance line (not viewing the whole text) changed whether a particular impoliteness formulae was categorised as “impolite” or not. Here, expanding the line allowed me to make a judgement on 5 instances where I was originally unsure; and in 7 instances, I actually reversed my earlier decision where viewing more context led to a different judgement altogether. This was more evidence, if it were needed, of the utility of viewing slightly more co-text to aid interpretation.

Future Tools Development

Due to the workshop’s focus on the future of concordancing, I closed my talk with some brief ideas to hopefully help alleviate some of these issues in the future. For example, toggling different colours if something in a concordance line appears in quotation marks may help with Issue 7 (see the Quotation Tool in [3]) and having the ability to see neighbouring words and sentences more quickly and easily (perhaps by hovering over the node word) may speed up the interpretative process and save time on moving from one screen to another. The usefulness of these suggestions will depend on the concordancer being used, the corpus under analysis, and the focus of the research, but given the ubiquity of concordance analysis and its inclusion in different research designs, greater functionality (and customisability) within concordancers can surely only be a good thing. As we move towards more sophisticated linguistic analyses, we must always ensure that we contextualise our findings; for the corpus linguist, a very good lens to look through is indeed the concordance line.

References

Gillings, M. and Mautner, G. (2024). Concordancing for CADS: Practical challenges and theoretical implications. International Journal of Corpus Linguistics. 29(1): 34-58.
van Dorst, I., Gillings, M., and Culpeper, J. (2024). Sociopragmatic variation in Britain: A corpus-based study of politeness. Journal of Pragmatics, 227: 37-56.
Bednarek, M., Schweinberger, M., and Lee, K. K. H. (2024). Corpus-based discourse analysis: From meta-reflection to accountability. Corpus Linguistics and Linguistic Theory. 20(3): 539-566.