Nathan Dykes, M. A.
Nathan Dykes, M. A.
Research areas:
- Corpus Linguistics
- Computational Linguistics
- Discourse Analysis
- Argument Mining
- Legal Tech
Since 10/2024
Research Assistant
FAU Erlangen-Nürnberg, Department of Digital Humanities and Social Studies (DHSS)
Since 05/2018
Research Assistant & PhD candidate
FAU Erlangen-Nürnberg, Chair of Computational Corpus Linguistics (CCL)
02/2022 to 09/2023
Research Assistant
FAU Erlangen-Nürnberg, Chair of English Linguistics
2016-2020
Lecturer for Swedish
FAU Erlangen-Nürnberg, Language Centre
Journal Articles
- Adrian, A., Dykes, N., Evert, S., Heinrich, P., & Keuchen, M. (2023). AUTOMATISCHE ANONYMISIERUNG VON GERICHTSURTEILEN – EINE VISION SCHEINT REALISIERBAR. Jusletter IT, March, 211-220. https://doi.org/10.38023/14A32D75-E299-40D4-9523-3AF8BD445F95
- Adrian, A., Dykes, N., Evert, S., Heinrich, P., & Keuchen, M. (2022). Entwicklung und Evaluation automatischer Verfahren zur Anonymisierung von Gerichtsentscheidungen. LegalTech, 4, 233-238.
- Peters, J., Dykes, N., Heckel, M., Ostgathe, C., & Habermann, M. (2022). Präsentation von Palliativstationen und SAPV-Teams im Internet - eine korpusbasierte Metaanalyse von Webseiten. Zeitschrift für Palliativmedizin, 23, 46-53. https://doi.org/10.1055/a-1689-7524
- Dykes, N., Evert, S., Göttlinger, M., Heinrich, P., & Schröder, L. (2021). Argument parsing via corpus queries. it - Information Technology, 63, 31-44. https://doi.org/10.1515/itit-2020-0051
- Dykes, N., Evert, S., Göttlinger, M., Heinrich, P., & Schröder, L. (2020). Reconstructing Arguments from Noisy Text. Datenbank-Spektrum, 20, 123-129. https://doi.org/10.1007/s13222-020-00342-y
- Peters, J., Dykes, N., Ostgathe, C., Habermann, M., & Heckel, M. (2020). Kompetenzdarstellung, Patientennähe und Argumentationsstrategien von Internetangeboten deutscher Hospize, Palliativstationen und SAPV-Teams-eine korpusbasierte Meta-Analyse. Zeitschrift für Palliativmedizin, 21(5), e34.
- Dykes, N., & Peters, J. (2020). Reconstructing argumentation patterns in German newspaper articles on multidrug-resistant pathogens: a multi-measure keyword approach. Journal of Corpora and Discourse Studies, 3, 51-74. https://doi.org/10.18573/jcads.35
- Peters, J., Dykes, N., Heckel, M., Ostgathe, C., & Habermann, M. (2019). A Linguistic Model of Communication Types in Palliative Medicine: Effects of Multidrug-Resistant Organisms (MDRO) Colonization or Infection and Isolation Measures in End of Life on Family Caregivers’ Knowledge, Attitude and Practices. Journal of Palliative Medicine, 22(8). https://doi.org/10.1089/jpm.2019.0027
- Peters, J., Dykes, N., Habermann, M., Ostgathe, C., & Heckel, M. (2019). Metaphors for multidrug-resistant bacteria in German newspaper articles, 1995-2015. A computer-assisted qualitative study. Metaphor and the Social World, 9(2), 221-241.
Book Contributions
- Adrian, A., Dykes, N., Evert, S., Heinrich, P., & Keuchen, M. (2023). Automatische Anonymisierung von Gerichtsurteilen – Eine Vision scheint realisierbar. In Erich Schweighofer / Jakob Zanol / Stefan Eder (Hrg.), Rechtsinformatik als Methodenwissenschaft des Rechts – Tagungsband des 26. Internationalen Rechtsinformatik Symposions IRIS 2023. (S. 211 - 220). Editions Weblaw.
- Peters, J., & Dykes, N. (2022). Die Palliativmedizinische Fachkultur in Geschichte und Gegenwart – sprachwissenschaftliche Perspektiven. In Ilg, Yvonne, Schnedermann, Theresa, Iakushevich, Marina (Eds.), Linguistik und Medizin. (pp. 194-214). Berlin, New York: De Gruyter.
- Adrian, A., Dykes, N., Evert, S., Heinrich, P., Keuchen, M., & Proisl, T. (2022). Manuelle und automatische Anonymisierung von Urteilen. In Adrian, Axel/Kohlhase, Michael/Evert, Stephanie/Zwickel, Martin (Hrg.), Digitalisierung von Zivilprozess und Rechtsdurchsetzung. (S. 173-197).
- Dykes, N., Heinrich, P., & Evert, S. (2022). Retrieving Twitter argumentation with corpus queries and discourse analysis. In Susanne Flach, Martin Hilpert (Eds.), Broadening the Spectrum of Corpus Linguistics: New approaches to variability and change. (pp. 229-256). John Benjamins Publishing Company.
- Keuchen, M., Adrian, A., Evert, S., Heinrich, P., & Dykes, N. (2021). Anonymisierung von Gerichtsurteilen – Eine wesentliche Voraussetzung für E-Justice –. In Schweighofer E, Eder S, Hanke P, Kummer F, Saarenpää A (Hrg.), Cybergovernance - Tagungsband des 24. Internationalen Rechtsinformatik Symposions IRIS 2021. (S. 137 - 149). Editions Weblaw.
Conference Contributions
- Heinrich, P., Blombach, A., Doan Dang, B., Zilio, L., Havenstein, L., Dykes, N.,... Schäfer, F. (2024). Automatic Identification of COVID-19-Related Conspiracy Narratives in German Telegram Channels and Chats. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 1932-1943). Turin, IT.
- Dykes, N., Evert, S., Heinrich, P., Humml, M., & Schröder, L. (2024). Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models. In Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C (Eds.), Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024 (pp. 52--57). Torino, Italy: Torino, Italy: ELRA and ICCL.
- Heinrich, P., Blombach, A., Doan Dang, B., Zilio, L., Havenstein, L., Dykes, N.,... Schäfer, F. (2024). Automatic Identification of COVID-19-related Narratives in German Telegram Channels and Chats. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue (Eds.), LREC-COLING 2024 - Main Conference Proceedings (pp. 1932-1943). Torino, IT: European Language Resources Association (ELRA).
- Dykes, N., Evert, S., Heinrich, P., Humml, M., & Schröder, L. (2024). Finding Argument Fragments on Social Media with Corpus Queries and LLMs. In Philipp Cimiano, Anette Frank, Michael Kohlhase, Benno Stein (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 163-181). Bielefeld, DEU: Springer Science and Business Media Deutschland GmbH.
- Dykes, N., Wilson, A., & Uhrig, P. (2023). A Pipeline for the Creation of Multimodal Corpora from YouTube Videos. In Piush Aggarwal, Özge Alaçam, Carina Silberer, Sina Zarrieß, Torsten Zesch (Eds.), Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing (LIMO 2023) (pp. 1-5). Ingolstadt, DE: Ingolstadt: Association for Computational Linguistics.
- Uhrig, P., Payne, E., Pavlova, I., Burenko, I., Dykes, N., Baltazani, M.,... Wilson, A. (2023). Studying Time Conceptualisation via Speech, Prosody, and Hand Gesture: Interweaving Manual and Computational Methods of Analysis. In Wim Pouw, James Trujillo, Hans Rutger Bosker, Linda Drijvers, Marieke Hoetjes, Judith Holler, Sarka Kadava, Lieke Van Maastricht, Ezgi Mamus, Asli Ozyurek (Eds.), Gesture and Speech in Interaction. Nijmegen, NL.
- Blombach, A., Dykes, N., Heinrich, P., Kabashi, B., & Proisl, T. (2020). A Corpus of German Reddit Exchanges (GeRedE). In Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (Eds.), LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp. 6310-6316). Marseille, FR: European Language Resources Association (ELRA).
- Dykes, N., Heinrich, P., & Blombach, A. (2020, February). Independent argumentation schemes? Transferring argument queries from Brexit to environment tweets. Paper presentation at ICAME41, Heidelberg, DE.
- Blombach, A., Dykes, N., Evert, S., Heinrich, P., Kabashi, B., & Proisl, T. (2020). A new German Reddit corpus. In Proceedings of the 15th Conference on Natural Language Processing, KONVENS 2019 (pp. 278-279). Erlangen-Nurnberg, DE: German Society for Computational Linguistics and Language Technology.
- Proisl, T., Dykes, N., Heinrich, P., Kabashi, B., Blombach, A., & Evert, S. (2020). EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus. In Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis (Eds.), LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp. 6142-6148). Marseille, FR: European Language Resources Association (ELRA).
- Dykes, N., Heinrich, P., & Evert, S. (2019, June). Arguing Brexit on Twitter. A corpus linguistic study. Paper presentation at European Conference on Argumentation 2019, Groningen, NL.
- Dykes, N., Heinrich, P., & Evert, S. (2019, June). Reconstructing Twitter arguments with corpus linguistics. Paper presentation at ICAME40: Language in Time, Time in Language, Neuchâtel, CH.
- Proisl, T., Uhrig, P., Heinrich, P., Blombach, A., Mammarella, S., Dykes, N., & Kabashi, B. (2019). The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri Without Even Knowing the Alphabet. In Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) (pp. 73-79). Trento, IT: Association for Computational Linguistics.
- Evert, S., Dykes, N., & Peters, J. (2018). A quantitative evaluation of keyword measures for corpus-based discourse analysis.
- Peters, J., & Dykes, N. (2018). From keywords to discourse - towards a keyword operationalisation model in discourse linguistics. In Corpora and Discourse International Conference. Lancaster.
-
Reconstructing Arguments from Newsworthy Debates
(Third Party Funds Single)
Term: 1. January 2021 - 31. December 2023
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://www.linguistik.phil.fau.de/projects/rant/Large portions of ongoing political debates are available in machine- readable form nowadays, ranging from the formal public sphere of parliamentary proceedings to the semi-public sphere of social media. This offers new opportunities for gaining a comprehensive overview of the arguments exchanged, using automated techniques to analyse text sources. The goal of the RANT/RAND project series within the priority programme RATIO (Robust Argumentation Machines) is to contribute to the automated extraction of arguments and argument structures from machine-readable texts via an approach that combines logical and corpus-linguistic methods and favours precision over recall, on the assumption that the sheer volume of available data will allow us to pinpoint prevalent arguments even under moderate recall. Specifically, we identify logical patterns corresponding to individual argument schemes taken from standard classifications, such as argument from expert opinion; essentially, these logical patterns are formulae with placeholders in dedicated modal logics. To each logical pattern we associate several linguistic patterns corresponding to different realisations of the formula in natural language; these patterns are developed and refined through corpus- linguistic studies and formalised in terms of corpus queries. Our approach thus integrates the development of automated argument extraction methods with work towards a better understanding of the linguistic aspects of everyday political argumentation. Research in the ongoing first project phase is focused on designing and evaluating patterns and queries for individual arguments, with a large corpus of English Twitter messages used as a running case study. In the second project phase, we plan to test the robustness of our approach by branching out into additional text types, in particular longer coherent texts such as newspaper articles and parliamentary debates, as well as by moving to German texts, which present additional challenges for the design of linguistic patterns (i.a. due to long- distance dependencies and limited availability of high-quality NLP tools). Crucially, we will also introduce similarity-based methods to enable complex reasoning on extracted arguments, representing the fillers in extracted formulae by specially tailored neural phrase embeddings. Moreover, we will extend the overall approach to allow for the high-precision extraction of argument structure, including explicit and implicit references to other arguments. We will combine these efforts with more specific investigations into the logical structure of arguments on how to achieve certain goals and into the interconnection between argumentation and interpersonal relationships, e.g. in ad-hominem arguments.
-
Reconstructing Arguments from Noisy Text (DFG Priority Programme 1999: RATIO)
(Third Party Funds Single)
Term: 1. January 2018 - 31. December 2020
Funding source: Deutsche Forschungsgemeinschaft (DFG)Social media are of increasing importance in current public discourse. In RANT, we aim to contribute methods and formalisms for the extraction, representation, and processing of arguments from noisy text found in discussions on social media, using a large corpus of pre-referendum Twitter messages on Brexit as a running case study. We will conduct a corpus-linguistic study to identify recurring linguistic argumentation patterns and design corresponding corpus queries to extract arguments from the corpus, following a high-precision/low-recall approach. In fact, we expect to be able to associate argumentation patterns directly with logical patterns in a dedicated formalism and accordingly parse individual arguments directly as logical formulas. The logical formalism for argument representation will feature a broad range of modalities capturing real-life modes of expression such as uncertainty, agency, preference, sentiment, vagueness, and defaults. We will cast this formalism as a family of instance logics in the generic logical framework of coalgebraic logic, which provides uniform semantic, deductive and algorithmic methods for modalities beyond the standard relational setup; in particular, reasoning support for the logics in question will be based on further development of an existing generic coalgebraic reasoner. The argument representation formalism will be complemented by a flexible framework for the representation of relationships between arguments. These will include standard relations such as Dung's attack relation or a support relation but also relations extracted from metadata such as citation, hashtags, or direct address (via mention of user names), as well as relationships that are inferred from the logical content of individual arguments. The latter may take on a non-relational nature, involving, e.g., fuzzy truth values, preference orderings, or probabilities, and will thus fruitfully be modelled in the uniform framework of coalgebra that has already appeared above as the semantic foundation of coalgebraic logic. We will develop suitable generalizations of Dung's extension semantics for argumentation frameworks, thus capturing notions such as “coherent point of view” or “pervasive opinion”; in combination with corresponding algorithmic methods, these will allow for the automated extraction of large-scale argumentative positions from the corpus.
[/lectures]
Organisation of a congress / conference
- Text Mining and Generation (TMG)
19. September 2022 - 19. September 2022, URL: https://recap.uni-trier.de/2022-tmg-workshop/