Brian Roark
Contact Info:
Google, Inc., 555 SW Morrison St., Ste. 500 Portland, OR 97204
roarkbr AT SYMBOL g m a i l DOT c o m
I am a computational linguist working on various topics in natural language processing. My research interests include:
language modeling for automatic speech recognition, text entry and other applications; weighted transducers and grammars; supervised and unsupervised learning of language models; text normalization and transliteration; pronunciation modeling; text entry, accessibility and augmentative and alternative communication (AAC); syntactic parsing of text and speech; statistical models of human language processing; spoken language processing for diagnosis of neurodevelopmental and neurodegenerative disorders.
Some recent-ish activities, links and/or resources:
- I am co-organizing (w/Kyle Gorman, Emily Prud’hommeaux and Richard Sproat) the Second Workshop on Computation and Written Language (CAWL 2024), to be held in conjunction with LREC-COLING in Torino, Italy, May 21, 2024.
Last year, we organized the First ACL Workshop on Computation and Written Language (CAWL), which was held at ACL 2023 in Toronto, July 14, 2023.
- I was co-Editor-in-chief for the Transactions of the Association for Computational Linguistics (TACL) from 2018-2022. The journal's 2021 impact factor was the topic of an MIT Press blog post.
- Here's the site of the Dakshina dataset, an open-source collection of romanized and native script Wikipedia in 12 South Asian languages that I helped put together.
- Here's a 2021 Google Research blog post about some work my team was involved in, transliterating geo entity names into Brahmic scripts. And here's
an earlier (2017) post about some related work I contributed to, providing transliteration keyboards in 20+ South Asian languages.