cv

Basics

Name Thennal D K
Label Undergraduate Student, NLP Researcher
Email thennal10@gmail.com
Url https://thennal10.github.io/

Education

  • 2021.12 - 2025.05

    *CGPA: 9.16*

    Bachelor
    Indian Institute of Information Technology Kottayam
    Computer Science (B.Tech with Honours)

Work

  • 2025.06 - Present
    Independent Researcher
    Language Technology Group, University of Hamburg
    Conducting research on actantial models, label projection, and LLM creativity.
    • Implemented pipelines to automatically apply the actantial narrative model using LLMs.
    • Designed LabelPigeon, a novel label projection technique utilized for low resource cross-lingual transfer.
    • Conducted evaluations on LabelPigeon with up to a +39.9 score improvement on downstream tasks, with corresponding paper submitted to **ACL 2026**.
    • Quantifying the creativity of LLMs via a novel narrative similarity evaluation scheme.
  • 2024.05 - 2024.08
    Research Intern
    Language Technology Group, University of Hamburg
    Conducted research on embedding models and large language model (LLM) representations.
    • Conducted a general investigation on embedding models and LLM representations, developing an optimal few-shot fine-tuning regime for topic modeling.
    • Devised L3Prune, a novel pruning procedure for LLM-based embedding models reducing model size by 21% with a negligible performance drop, with corresponding paper published in **RepL4NLP 2025**.
  • 2023.04 - 2024.04
    Machine Learning Intern
    Institute of Human Resource Development
    Developed and deployed machine learning models for speech recognition, speaker identification, and face recognition.
    • Worked with the Kerala Police Intelligence Department, leading a team of 15 to develop AI systems for policing.
    • Trained and deployed a state-of-the-art automatic speech recognition (ASR) model for Malayalam, along with a speaker extraction and identification system.
    • Designed and implemented a Malayalam news extraction system utilizing optical character recognition.
    • Developed a scalable face recognition system optimized for fast inference.
  • 2022.12 - 2023.03
    Data Science Intern
    Institute of Human Resource Development
    Optimized web infrastructure and data-driven decision making.
    • Rebuilt web stack to streamline employee workflows, significantly improving productivity.
    • Created visualizations and reports to communicate insights to stakeholders and government agencies.
    • Optimized purchasing decisions based on statistical modeling.

Volunteer

  • 2018.02 - Present
    Communications Lead
    Led outreach, website development, and advocacy for marginalized communities.
    • Organized events, campaigns, and workshops for the queer community in Kerala, as well as advocacy workshops and awareness programs on trans rights across universities and government institutions.
    • Developed and launched a public website, significantly expanding international outreach and visibility.
    • Served as the main liaison with donors, institutions, and affiliates, securing over **$3M** in grants.
    • Taught math and programming to marginalized children as part of the Life Skill Education Summer Program.

Projects

  • 2024.08 - 2024.10
    Advocating for Character Error Rate in Automatic Speech Recognition
    • Documented the shortcomings of the commonly used word error rate metric for multilingual evaluation in collaboration with Dr. Jesin James.
    • Collected human preference data in 3 languages and calculated metric correlations, providing experimental evidence in favour of Character Error Rate.
    • Wrote a paper arguing our position, accepted at **NAACL 2025**.
  • 2023.07 - 2023.10
    Fisher Mask Nodes for Model Merging
    • Devised a novel and compute-efficient model merging algorithm under Dr. Suchithra M S.
    • Evaluated performance of our method across various models in the BERT family, with performance improvement of **+6.5%** and a speedup between 57.4x and 321.7x.
    • Documented research outcomes in an academic paper, published in **LREC-COLING 2024**.
  • 2018.03 - 2022.11
    ICFOSS Malayalam Speech Corpus
    • Collaborated on IMaSC - The ICFOSS Malayalam Speech Corpus, a 50-hour text-to-speech dataset.
    • Supervised data collection, speaker recording, and quality control.
    • Trained and evaluated multiple models, achieving an average MOS score of 4.50.
    • Wrote a research paper compiling our results, accepted at **LREC 2026**.
  • 2022.12 - 2022.12
    Whisper Malayalam
    • Trained ASR models on Malayalam speech data as part of the Huggingface Whisper Fine-Tune Community Sprint.
    • Elevated the medium-sized model's performance to the top of the leaderboard, making it the state-of-the-art solution for Malayalam ASR as evidenced by **30k downloads** and counting.

Awards

Publications

Skills

Programming Languages
Python
HTML/CSS
JavaScript
SQL
C/C++
C#
Java
Frameworks
PyTorch
TensorFlow
Scikit-learn
Vue.js
React
Unity
Miscellaneous
Linux
Shell (Bash/Zsh)
LaTeX (Overleaf/R Markdown)
Git
Docker
PostgreSQL