cv
Basics
| Name | Thennal D K |
| Label | Undergraduate Student, NLP Researcher |
| thennal10@gmail.com | |
| Url | https://thennal10.github.io/ |
Education
-
2021.12 - 2025.05 *CGPA: 9.16*
Work
-
2025.06 - Present Independent Researcher
Language Technology Group, University of Hamburg
Conducting research on actantial models, label projection, and LLM creativity.
- Implemented pipelines to automatically apply the actantial narrative model using LLMs.
- Designed LabelPigeon, a novel label projection technique utilized for low resource cross-lingual transfer.
- Conducted evaluations on LabelPigeon with up to a +39.9 score improvement on downstream tasks, with corresponding paper submitted to **ACL 2026**.
- Quantifying the creativity of LLMs via a novel narrative similarity evaluation scheme.
-
2024.05 - 2024.08 Research Intern
Language Technology Group, University of Hamburg
Conducted research on embedding models and large language model (LLM) representations.
- Conducted a general investigation on embedding models and LLM representations, developing an optimal few-shot fine-tuning regime for topic modeling.
- Devised L3Prune, a novel pruning procedure for LLM-based embedding models reducing model size by 21% with a negligible performance drop, with corresponding paper published in **RepL4NLP 2025**.
-
2023.04 - 2024.04 Machine Learning Intern
Institute of Human Resource Development
Developed and deployed machine learning models for speech recognition, speaker identification, and face recognition.
- Worked with the Kerala Police Intelligence Department, leading a team of 15 to develop AI systems for policing.
- Trained and deployed a state-of-the-art automatic speech recognition (ASR) model for Malayalam, along with a speaker extraction and identification system.
- Designed and implemented a Malayalam news extraction system utilizing optical character recognition.
- Developed a scalable face recognition system optimized for fast inference.
-
2022.12 - 2023.03 Data Science Intern
Institute of Human Resource Development
Optimized web infrastructure and data-driven decision making.
- Rebuilt web stack to streamline employee workflows, significantly improving productivity.
- Created visualizations and reports to communicate insights to stakeholders and government agencies.
- Optimized purchasing decisions based on statistical modeling.
Volunteer
-
2018.02 - Present Communications Lead
Led outreach, website development, and advocacy for marginalized communities.
- Organized events, campaigns, and workshops for the queer community in Kerala, as well as advocacy workshops and awareness programs on trans rights across universities and government institutions.
- Developed and launched a public website, significantly expanding international outreach and visibility.
- Served as the main liaison with donors, institutions, and affiliates, securing over **$3M** in grants.
- Taught math and programming to marginalized children as part of the Life Skill Education Summer Program.
Projects
- 2024.08 - 2024.10
Advocating for Character Error Rate in Automatic Speech Recognition
- Documented the shortcomings of the commonly used word error rate metric for multilingual evaluation in collaboration with Dr. Jesin James.
- Collected human preference data in 3 languages and calculated metric correlations, providing experimental evidence in favour of Character Error Rate.
- Wrote a paper arguing our position, accepted at **NAACL 2025**.
- 2023.07 - 2023.10
Fisher Mask Nodes for Model Merging
- Devised a novel and compute-efficient model merging algorithm under Dr. Suchithra M S.
- Evaluated performance of our method across various models in the BERT family, with performance improvement of **+6.5%** and a speedup between 57.4x and 321.7x.
- Documented research outcomes in an academic paper, published in **LREC-COLING 2024**.
- 2018.03 - 2022.11
ICFOSS Malayalam Speech Corpus
- Collaborated on IMaSC - The ICFOSS Malayalam Speech Corpus, a 50-hour text-to-speech dataset.
- Supervised data collection, speaker recording, and quality control.
- Trained and evaluated multiple models, achieving an average MOS score of 4.50.
- Wrote a research paper compiling our results, accepted at **LREC 2026**.
- 2022.12 - 2022.12
Whisper Malayalam
- Trained ASR models on Malayalam speech data as part of the Huggingface Whisper Fine-Tune Community Sprint.
- Elevated the medium-sized model's performance to the top of the leaderboard, making it the state-of-the-art solution for Malayalam ASR as evidenced by **30k downloads** and counting.
Awards
- 2024
DAAD WISE Scholarship
German Academic Exchange Service
Publications
-
2026.03.01 Just Use XML: Revisiting Joint Translation and Label Projection.
arXiv Preprint, Submitted to ACL 2026
-
2025.05.01 Large Language Models Are Overparameterized Text Encoders
Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025) (pp. 170-184)
-
2025.04.01 Advocating Character Error Rate for Multilingual ASR Evaluation
Findings of the Association for Computational Linguistics: NAACL 2025 (pp. 4926-4935)
-
2024.05.01 Fisher Mask Nodes for Language Model Merging
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 7349–7355)
-
2022.11.01 IMaSC -- ICFOSS Malayalam Speech Corpus
arXiv Preprint, Accepted in LREC 2026
-
2022.09.01 Performance Enhancement of Deep Neural Network Based Automatic Voice Disorder Detection System with Data Augmentation — A Case Study
Biomedical Engineering: Applications, Basis and Communications, Vol. 35
-
2019.11.01 Memory Based Speech Duration Model using Exemplar Theoretic Approach
Proceedings of the International Conference on Artificial Intelligence & Speech Technologies (AIST 2019) (pp. 108–112)
Skills
| Programming Languages | |
| Python | |
| HTML/CSS | |
| JavaScript | |
| SQL | |
| C/C++ | |
| C# | |
| Java |
| Frameworks | |
| PyTorch | |
| TensorFlow | |
| Scikit-learn | |
| Vue.js | |
| React | |
| Unity |
| Miscellaneous | |
| Linux | |
| Shell (Bash/Zsh) | |
| LaTeX (Overleaf/R Markdown) | |
| Git | |
| Docker | |
| PostgreSQL |