Thennal D K

I’m Thennal (any/all), a CS undergrad at the Indian Institute of Information Technology Kottayam, conducting natural language processing research since 2018. I’m interested in what makes language/speech models tick, and how to make them tick better. In particular:

How do large pretrained models form their internal representations, and how does each component update it?
There are a lot of pretrained and finetuned models available publicly. Can we use them to make better models?
The field has a significant evaluation/benchmarking problem, particularly when it comes to non-English languages. How can we make it better?

I also like running, fungi, anything produced by Supergiant Games, and Japanese music. Go watch Etsuko Yakushimaru’s I’m Humanity, and then read about it.

news

Jan 22, 2025	Our paper on ASR evaluation metrics was accepted to NAACL Findings 2025!
Oct 18, 2024	Two new preprints, related to my internship with the University of Hamburg and my collaboration Jesin James from the University of Auckland.
Feb 20, 2024	Paper accepted at LREC-COLING 2024! Excited to go there in May and present our work, Fisher Mask Nodes for Language Model Merging.
Feb 17, 2024	Got the DAAD WISE scholarship for an internship with the University of Hamburg!

latest posts

Oct 24, 2024	The lost art of checking your sources
Dec 29, 2022	Whisper's evaluated metrics are kind of wrong for a bunch of languages
Jun 23, 2021	Mnemonics are useless

selected publications

Large Language Models Are Overparameterized Text Encoders

Thennal D K , Tim Fischer , and Chris Biemann

In Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025), May 2025

Abs DOI URL

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that by pruning the last % layers of an LLM before supervised training for only 1000 steps, we can achieve a proportional reduction in memory and inference time. We evaluate four different state-of-the-art LLMs on text embedding tasks and find that our method can prune up to 30% of layers with negligible impact on performance and up to 80% with only a modest drop. With only three lines of code, our method is easily implemented in any pipeline for transforming LLMs to text encoders. We also propose L3Prune, a novel layer-pruning strategy based on the model’s initial loss that provides two optimal pruning configurations: a large variant with negligible performance loss and a small variant for resource-constrained settings. On average, the large variant prunes 21% of the parameters with a performance drop, and the small variant only suffers from a decrease while pruning 74% of the model. We consider these results strong evidence that LLMs are overparameterized for text embedding tasks, and can be easily pruned.
Fisher Mask Nodes for Language Model Merging

Thennal D K , Ganesh Nathan , and Suchithra M S

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs URL

Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup between 57.4x and 321.7x across models. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios.
Advocating Character Error Rate for Multilingual ASR Evaluation

Thennal D K , Jesin James , Deepa Padmini Gopinath, and 1 more author

In Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025

Abs DOI URL

Automatic speech recognition (ASR) systems have traditionally been evaluated using English datasets, with the word error rate (WER) serving as the predominant metric. WER’s simplicity and ease of interpretation have contributed to its widespread adoption, particularly for English. However, as ASR systems expand to multilingual contexts, WER fails in various ways, particularly with morphologically complex languages or those without clear word boundaries. Our work documents the limitations of WER as an evaluation metric and advocates for the character error rate (CER) as the primary metric in multilingual ASR evaluation. We show that CER avoids many of the challenges WER faces and exhibits greater consistency across writing systems. We support our proposition by conducting human evaluations of ASR transcriptions in three languages—Malayalam, English, and Arabic—which exhibit distinct morphological characteristics. We show that CER correlates more closely with human judgments than WER, even for English. To facilitate further research, we release our human evaluation dataset for future benchmarking of ASR metrics. Our findings suggest that CER should be prioritized, or at least supplemented, in multilingual ASR evaluations to account for the varying linguistic characteristics of different languages.