Siddharth Dalmia
Carnegie Mellon University
H-index: 18
North America-United States
Top articles of Siddharth Dalmia
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | arXiv preprint arXiv:2403.05530 | Machel Reid Nikolay Savinov Denis Teplyashin Dmitry Lepikhin Timothy Lillicrap | 2024/3/8 |
Llm augmented llms: Expanding capabilities through composition | arXiv preprint arXiv:2401.02412 | Rachit Bansal Bidisha Samanta Siddharth Dalmia Nitish Gupta Shikhar Vashishth | 2024/1/4 |
Multimodal Modeling for Spoken Language Identification | Shikhar Bharadwaj Min Ma Shikhar Vashishth Ankur Bapna Sriram Ganapathy | 2024/4/14 | |
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems | arXiv preprint arXiv:2404.01616 | Frank Palma Gomez Ramon Sanabria Yun-hsuan Sung Daniel Cer Siddharth Dalmia | 2024/4/2 |
Legonn: Building modular encoder-decoder models | IEEE/ACM Transactions on Audio, Speech, and Language Processing | Siddharth Dalmia Dmytro Okhonko Mike Lewis Sergey Edunov Shinji Watanabe | 2023/7/17 |
Espnet-st-v2: Multipurpose spoken language translation toolkit | arXiv preprint arXiv:2304.04596 | Brian Yan Jiatong Shi Yun Tang Hirofumi Inaguma Yifan Peng | 2023/4/10 |
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models | arXiv preprint arXiv:2210.15734 | Siddhant Arora Siddharth Dalmia Brian Yan Florian Metze Alan W Black | 2022/10/27 |
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization | ICASSP 2022 | Brian Yan Chunlei Zhang Meng Yu Shi-Xiong Zhang Siddharth Dalmia | 2022 |
CMU’s IWSLT 2022 Dialect Speech Translation System | Brian Yan Patrick Fernandes Siddharth Dalmia Jiatong Shi Yifan Peng | 2022/5 | |
CTC alignments improve autoregressive translation | arXiv preprint arXiv:2210.05200 | Brian Yan Siddharth Dalmia Yosuke Higuchi Graham Neubig Florian Metze | 2022/10/11 |
Exploiting Compositionality in Sequence Models | Siddharth Dalmia | 2022 | |
Two-pass low latency end-to-end spoken language understanding | arXiv preprint arXiv:2207.06670 | Siddhant Arora Siddharth Dalmia Xuankai Chang Brian Yan Alan Black | 2022/7/14 |
Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding | Yifan Peng Siddharth Dalmia Ian Lane Shinji Watanabe | 2022/6/28 | |
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding | Yifan Peng Siddhant Arora Yosuke Higuchi Yushi Ueda Sujay Kumar | 2023/1/9 | |
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech | SLT 2022 | Alexis Conneau Min Ma Simran Khanuja Yu Zhang Vera Axelrod | 2022/5/25 |
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding | arXiv preprint arXiv:2106.15065 | Siddhant Arora Alissa Ostapenko Vijay Viswanathan Siddharth Dalmia Florian Metze | 2021/6/29 |
Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation | Jiatong Shi Jonathan D Amith Xuankai Chang Siddharth Dalmia Brian Yan | 2021/6 | |
NoiseQA: Challenge set evaluation for user-centric question answering | arXiv preprint arXiv:2102.08345 | Abhilasha Ravichander Siddharth Dalmia Maria Ryskina Florian Metze Eduard Hovy | 2021/2/16 |
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates | Hirofumi Inaguma Siddharth Dalmia Brian Yan Shinji Watanabe | 2021/12/13 | |
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks | arXiv preprint arXiv:2105.00573 | Siddharth Dalmia Brian Yan Vikas Raunak Florian Metze Shinji Watanabe | 2021/5/2 |