cv

Basics

Name Thennal D K
Label Undergraduate Student, NLP Researcher
Email thennal10@gmail.com
Url https://thennal10.github.io/

Education

  • 2021.12 - 2025.05

    *CGPA: 9.11*

    Bachelor
    Indian Institute of Information Technology Kottayam
    Computer Science (B.Tech with Honours)

Work

  • 2024.05 - 2024.08
    Research Intern
    Language Technology Group, University of Hamburg
    Conducted research on embedding models and large language model (LLM) representations.
    • Developed an optimal few-shot fine-tuning regime for topic modeling.
    • Devised a novel pruning procedure for LLM-based embedding models, reducing model size by 21% with negligible performance drop.
    • Wrote a research paper based on findings, currently in review.
    • Internship funded by the DAAD WISE scholarship.
  • 2023.04 - 2024.04
    Machine Learning Intern
    Kerala Police Intelligence Department
    Developed and deployed machine learning models for speech recognition, speaker identification, and face recognition.
    • Trained and deployed a state-of-the-art automatic speech recognition (ASR) model for Malayalam.
    • Designed and implemented a Malayalam news extraction system using web scraping and OCR.
    • Developed a scalable face recognition system optimized for fast inference.
    • Led a 15-person team for the task, ensuring seamless collaboration.
  • 2022.12 - 2023.03
    Data Scientist Intern
    Production and Maintenance Division, Institute of Human Resource Development
    Optimized web infrastructure and data-driven decision making.
    • Rebuilt web stack to streamline employee workflows and replace outdated systems.
    • Created visualizations and reports to communicate insights to internal stakeholders and government agencies.
    • Collaborated with the procurement team to optimize purchasing decisions.
  • 2018.03 - 2022.11
    Research Intern
    International Centre for Free and Open Source Software (ICFOSS)
    Developed and managed the largest publicly available Malayalam speech-text corpus.
    • Created IMaSC - The ICFOSS Malayalam Speech Corpus, a 50-hour text-to-speech dataset.
    • Supervised data collection, speaker recording, and quality control.
    • Trained and evaluated multiple models, achieving an average MOS score of 4.50.

Volunteer

  • 2018.02 - Present
    Communications Lead
    • Organized events, campaigns, and workshops for the queer community in Kerala.
    • Developed and launched a public website, expanding international outreach.
    • Served as the main liaison with donors, securing over $3M in grants.
    • Taught math and programming to marginalized children.

Projects

  • 2024.08 - Present
    Model Merging for Automatic Speech Recognition
    • Investigating model merging techniques for ASR models fine-tuned in different languages with Dr. Manu Madhavan.
    • Establishing a rigorous benchmark for model merging with clear metrics.
    • Developing novel model merging methods applicable to inter-lingual merging.
  • 2024.08 - 2024.10
    Advocating for Character Error Rate in Automatic Speech Recognition
    • Documented shortcomings of the commonly used word error rate metric for multilingual evaluation with Dr. Jesin James.
    • Conducted multilingual surveys collecting human preferences among different ASR models.
    • Calculated metric correlations, providing experimental evidence in favor of Character Error Rate.
    • Paper accepted at NAACL 2025.
  • 2023.07 - 2023.10
    Fisher Mask Nodes for Model Merging
    • Developed a novel and compute-efficient model merging algorithm with Dr. Suchithra M S.
    • Evaluated performance on various BERT family models, achieving a performance improvement of +6.5%.
    • Achieved speedups between 57.4x and 321.7x.
    • Published in LREC-COLING 2024.
  • 2018.05 - 2019.02
    Data Augmentation for Automatic Voice Disorder Detection
    • Evaluated data augmentation techniques for automatic voice disorder detection, focusing on leukoplakia with Dr Vrinda V Nair.
    • Developed a custom data augmentation strategy that increased dataset size by 8x while preserving data diversity.
    • Achieved a 46.9% increase in accuracy.
    • Published in a peer-reviewed journal.

Awards

Publications

Skills

Programming Languages
Python
HTML/CSS
JavaScript
SQL
C/C++
C#
Java
Frameworks
PyTorch
TensorFlow
Scikit-learn
Vue.js
React
Unity
Miscellaneous
Linux
Shell (Bash/Zsh)
LaTeX
Git
Docker
PostgreSQL