Brian Roark
Contact Info:
Google, Inc., 555 SW Morrison St., Ste. 500 Portland, OR 97204 email:
roarkbr AT SYMBOL g m a i l DOT c o m
I am a computational linguist working on various topics in natural language processing. My research interests include:
transliteration and text normalization; language identification; language modeling for automatic speech recognition, text entry and other applications; weighted
transducers and grammars; supervised and unsupervised learning of language models; pronunciation modeling; text entry, accessibility and augmentative & alternative communication (AAC); syntactic parsing
of text and speech; statistical models of human language processing; spoken language processing for
diagnosis of neurodevelopmental and neurodegenerative disorders.
A few recent publications:
- Brian Roark, Richard Sproat and Su-Youn Yoon. 2026. Tools of the Scribe: How Writing Systems, Technology, and Human Factors Interact To Affect the Act of Writing. Springer Nature, Cham, Switzerland. (See link to review in Nature below.)
- Adrian Benton, Alexander Gutkin, Christo Kirov and Brian Roark. 2025. Improving Informally Romanized Language Identification. In Proceedings of EMNLP, pp. 2318–2336. preprint
- Christo Kirov, Cibu Johny, Anna Katanova, Alexander Gutkin and Brian Roark. 2024. Context-aware transliteration of Romanized South Asian languages. Computational Linguistics, 50(2): 475–534.
Other publications; Google Scholar profile; Google Research page; Semantic Scholar page; CV.
Some recent-ish activities, links and/or resources:
- Here's a nice review of Tools of the Scribe from January 2026 in the journal Nature from Andrew Robinson, in which he labels it "a stimulating and original, if technical, book". Aw, shucks...
- I gave a talk on "Empirical methods in context-aware transliteration" at the Eugene Charniak Memorial Symposium at the CS Deptartment of Brown University in November, 2024.
- I co-organized (w/Kyle Gorman, Emily Prud’hommeaux and Richard Sproat) the Second Workshop on Computation and Written Language (CAWL 2024), held in conjunction with LREC-COLING in Torino, Italy, May 21, 2024.
The workshop was sponsored by the newly-formed ACL Special Interest Group on Writing Systems and Written Language (SIGWrit), which I helped establish in 2023.
The previous year, we organized the First ACL Workshop on Computation and Written Language (CAWL), which was held at ACL 2023 in Toronto, July 14, 2023.
- I was co-Editor-in-chief for the Transactions of the Association for Computational Linguistics (TACL) from 2018-2022. The journal's 2021 impact factor was the topic of an MIT Press blog post.
- Here's the site of the Dakshina dataset, an open-source collection of romanized and native script Wikipedia in 12 South Asian languages that I helped put together.
- Here's a 2021 Google Research blog post about some work my team was involved in, transliterating geo entity names into Brahmic scripts. And here's
an earlier (2017) post about some related work I contributed to, providing transliteration keyboards in 20+ South Asian languages.