I’m a speech researcher at Samsung Research, focusing on a broad range of areas within speech recognition and spoken keyword spotting. My work includes:

Custom wake-up word: Developing text-audio representation learning for spoken keyword detection with text-only enrollment using tiny (150K parameters) models.
Cross-modal inference-time intervention: Estimating internal language models learned by speech models to improve inference-time integration with external language models.
ASR for Samsung Bixby: Full-stack ASR engineering for Samsung Bixby, including large-scale training, model compression, long-form speech recognition, and cross-modal inference-time biasing.

In addition to my research in speech processing, I am generally interested in the interpretability of neural networks. I believe that understanding the inner mechanisms of current award-winning large models will help us gain controllability and design more computationally efficient models. On that note, I am interested in the following research topics.

Speech foundation models: What can we learn from the speech represenations of each layers of the speech foundation models? What is the best way to utilize the phonetic and synthetic information for downstream tasks?
Interpretability of large models: How are the emergent abilities such as reasoning presented in neural networks? How can we extract the circuits and make them more explicit?
Controllability and Efficiency: How can we apply our understandings about the neural models to make them more controllable and efficient?

Prior to Samsung Research, I received B.S. in CSE from Seoul National University.

News!

(09/24) I will be presenting my paper on spoken keyword detection at INTERSPEECH 2024!