cv | Thennal D K

Basics

Name	Thennal D K
Label	Undergraduate Student, NLP Researcher
Email	thennal10@gmail.com
Url	https://thennal10.github.io/

Education

2021.12 - 2025.05

*CGPA: 9.16*
Bachelor

Indian Institute of Information Technology Kottayam

Computer Science (B.Tech with Honours)

Work

2025.06 - Present
Independent Researcher

Language Technology Group, University of Hamburg

Conducting research on actantial models, label projection, and LLM creativity.
- Implemented pipelines to automatically apply the actantial narrative model using LLMs.
- Designed LabelPigeon, a novel label projection technique utilized for low resource cross-lingual transfer.
- Conducted evaluations on LabelPigeon with up to a +39.9 score improvement on downstream tasks, with corresponding paper submitted to **ACL 2026**.
- Quantifying the creativity of LLMs via a novel narrative similarity evaluation scheme.
2024.05 - 2024.08
Research Intern

Language Technology Group, University of Hamburg

Conducted research on embedding models and large language model (LLM) representations.
- Conducted a general investigation on embedding models and LLM representations, developing an optimal few-shot fine-tuning regime for topic modeling.
- Devised L3Prune, a novel pruning procedure for LLM-based embedding models reducing model size by 21% with a negligible performance drop, with corresponding paper published in **RepL4NLP 2025**.
2023.04 - 2024.04
Machine Learning Intern

Institute of Human Resource Development

Developed and deployed machine learning models for speech recognition, speaker identification, and face recognition.
- Worked with the Kerala Police Intelligence Department, leading a team of 15 to develop AI systems for policing.
- Trained and deployed a state-of-the-art automatic speech recognition (ASR) model for Malayalam, along with a speaker extraction and identification system.
- Designed and implemented a Malayalam news extraction system utilizing optical character recognition.
- Developed a scalable face recognition system optimized for fast inference.
2022.12 - 2023.03
Data Science Intern

Institute of Human Resource Development

Optimized web infrastructure and data-driven decision making.
- Rebuilt web stack to streamline employee workflows, significantly improving productivity.
- Created visualizations and reports to communicate insights to stakeholders and government agencies.
- Optimized purchasing decisions based on statistical modeling.

Volunteer

2018.02 - Present
Communications Lead

Led outreach, website development, and advocacy for marginalized communities.
- Organized events, campaigns, and workshops for the queer community in Kerala, as well as advocacy workshops and awareness programs on trans rights across universities and government institutions.
- Developed and launched a public website, significantly expanding international outreach and visibility.
- Served as the main liaison with donors, institutions, and affiliates, securing over **$3M** in grants.
- Taught math and programming to marginalized children as part of the Life Skill Education Summer Program.

Projects

2024.08 - 2024.10
Advocating for Character Error Rate in Automatic Speech Recognition
- Documented the shortcomings of the commonly used word error rate metric for multilingual evaluation in collaboration with Dr. Jesin James.
- Collected human preference data in 3 languages and calculated metric correlations, providing experimental evidence in favour of Character Error Rate.
- Wrote a paper arguing our position, accepted at **NAACL 2025**.
2023.07 - 2023.10
Fisher Mask Nodes for Model Merging
- Devised a novel and compute-efficient model merging algorithm under Dr. Suchithra M S.
- Evaluated performance of our method across various models in the BERT family, with performance improvement of **+6.5%** and a speedup between 57.4x and 321.7x.
- Documented research outcomes in an academic paper, published in **LREC-COLING 2024**.
2018.03 - 2022.11
ICFOSS Malayalam Speech Corpus
- Collaborated on IMaSC - The ICFOSS Malayalam Speech Corpus, a 50-hour text-to-speech dataset.
- Supervised data collection, speaker recording, and quality control.
- Trained and evaluated multiple models, achieving an average MOS score of 4.50.
- Wrote a research paper compiling our results, accepted at **LREC 2026**.
2022.12 - 2022.12
Whisper Malayalam
- Trained ASR models on Malayalam speech data as part of the Huggingface Whisper Fine-Tune Community Sprint.
- Elevated the medium-sized model's performance to the top of the leaderboard, making it the state-of-the-art solution for Malayalam ASR as evidenced by **30k downloads** and counting.

Awards

2024

DAAD WISE Scholarship

German Academic Exchange Service

Publications

2026.03.01

Just Use XML: Revisiting Joint Translation and Label Projection.

arXiv Preprint, Submitted to ACL 2026
2025.05.01

Large Language Models Are Overparameterized Text Encoders

Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025) (pp. 170-184)
2025.04.01

Advocating Character Error Rate for Multilingual ASR Evaluation

Findings of the Association for Computational Linguistics: NAACL 2025 (pp. 4926-4935)
2024.05.01

Fisher Mask Nodes for Language Model Merging

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 7349–7355)
2022.11.01

IMaSC -- ICFOSS Malayalam Speech Corpus

arXiv Preprint, Accepted in LREC 2026
2022.09.01

Performance Enhancement of Deep Neural Network Based Automatic Voice Disorder Detection System with Data Augmentation — A Case Study

Biomedical Engineering: Applications, Basis and Communications, Vol. 35
2019.11.01

Memory Based Speech Duration Model using Exemplar Theoretic Approach

Proceedings of the International Conference on Artificial Intelligence & Speech Technologies (AIST 2019) (pp. 108–112)

Skills

	Programming Languages
	Python
	HTML/CSS
	JavaScript
	SQL
	C/C++
	C#
	Java

	Frameworks
	PyTorch
	TensorFlow
	Scikit-learn
	Vue.js
	React
	Unity

	Miscellaneous
	Linux
	Shell (Bash/Zsh)
	LaTeX (Overleaf/R Markdown)
	Git
	Docker
	PostgreSQL

Basics

Education

Indian Institute of Information Technology Kottayam

Computer Science (B.Tech with Honours)

Work

Language Technology Group, University of Hamburg

Conducting research on actantial models, label projection, and LLM creativity.

Language Technology Group, University of Hamburg

Conducted research on embedding models and large language model (LLM) representations.

Institute of Human Resource Development

Developed and deployed machine learning models for speech recognition, speaker identification, and face recognition.

Institute of Human Resource Development

Optimized web infrastructure and data-driven decision making.

Volunteer

Led outreach, website development, and advocacy for marginalized communities.

Projects

Awards

German Academic Exchange Service

Publications

arXiv Preprint, Submitted to ACL 2026

Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025) (pp. 170-184)

Findings of the Association for Computational Linguistics: NAACL 2025 (pp. 4926-4935)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 7349–7355)

arXiv Preprint, Accepted in LREC 2026

Biomedical Engineering: Applications, Basis and Communications, Vol. 35

Proceedings of the International Conference on Artificial Intelligence & Speech Technologies (AIST 2019) (pp. 108–112)

Skills