Siddharth Karamcheti

skaramcheti@cs.stanford.edu
Stanford, CA

· · · ·

I am a final-year PhD Student in computer science at Stanford University where I'm grateful to be co-advised by Dorsa Sadigh and Percy Liang. I am honored to be supported by the Open Philanthropy Project AI Fellowship.

Research: I work on robot learning and natural language processing. My research focuses on two axes:

Developing foundation models for robotic perception and control (building better "raw materials" for robotics).
Deploying adaptive robots for real-world collaboration (constructing and evaluating systems with real people).

Timeline: I am currently a robotics intern at the Toyota Research Institute working on large behavior models.

Earlier in my PhD, I spent time as a research intern at Hugging Face 🤗 working on multimodal pretraining and vision-language models with an incredible team of collaborators.

Before Stanford, I was a resident at Facebook AI Research in New York, where I was lucky to work with Rob Fergus, Douwe Kiela, Jason Weston, and Arthur Szlam on grounded language understanding. Here's a short Q&A I did about my residency. Prior to that, I was lucky to do two summer internships at Bloomberg Research with wonderful mentors Gideon Mann and David Rosenberg.

I completed my undergraduate degrees in computer science and literary arts at Brown University. While there, I did research in NLP & human-robot interaction advised by Eugene Charniak and Stefanie Tellex.

News

[December 2024]: I am on the academic job market! Please reach out if you're interested in my research and think I would be a good fit for your department!
[November 2024]: Incredibly excited to be at CoRL 2024 presenting both Vocal Sandbox and OpenVLA; don't miss Jenn's talk on Vocal Sandbox during Oral Session 3! I'll also be at the afternoon panel at the LangRob workshop on Saturday – hope to see y'all in Munich!
[October 2024]: Thrilled to share "Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration" – we introduce a new framework for building robots that can progressively improve over time, learning new skills, visual concepts, and planning behaviors from their partners in real time. Check out our project page for videos, especially our two hour long demo deploying a Vocal Sandbox system for LEGO stop-motion animation!
[June 2024]: Introducing "OpenVLA: an Open-Source Vision-Language-Action Model" – a generalist policy for robotic manipulation, built directly using the VLMs/codebase we developed in Prismatic. Trained on almost 1M demonstrations from Open X-Embodiment (and DROID), not only do we get some great zero-shot performance, but you can efficiently fine-tune for new robots and tasks! Huge shout out to my co-leads Moo Jin Kim and Karl Pertsch; check out our models on HuggingFace, and our codebase.
[July 2023]: Introducing Voltron! Check out our paper on “Language-Driven Representation Learning for Robotics”; we show that we can use language supervision to learn general visual representations for multiple robotics tasks (e.g., grasp segmentation, intent inference, learning for control, amongst others). Our models and pretraining code are open-source, as well as our diverse evaluation suite.

Publications

2024

Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration
Jennifer Grannen*, Siddharth Karamcheti*, Suvir Mirchandani, Percy Liang, Dorsa Sadigh.
Conference on Robot Learning (CoRL), November 2024.
Oral Presentation
[pdf] [homepage]

OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim*, Karl Pertsch*, Siddharth Karamcheti*, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam,
Pannag Sanketi, Quan Vuong, Thomas Kollar, Ben Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn.
Conference on Robot Learning (CoRL), November 2024.
Outstanding Paper Award Finalist
[pdf] [homepage] [models] [code]

DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset
DROID Collaboration (Research Lead – Data Curation & Annotation, Lab Lead).
Robotics: Science and Systems (RSS), July 2024.
[pdf] [homepage] [dataset visualizer] [colab]

Prismatic VLMs: Investigating the Design Space of Vision-Language Models
Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh.
International Conference on Machine Learning (ICML), July 2024.
[pdf] [code & models] [code - evaluation]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration.
IEEE International Conference on Robotics and Automation (ICRA), May 2024.
Best Paper Award
[pdf] [homepage] [datasets] [code]

2023

Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang.
Robotics: Science and Systems (RSS), July 2023.
Best Paper Award Finalist
[pdf] [homepage] [code - models] [code - evaluation]

“No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy
Yuchen Cui*, Siddharth Karamcheti*, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh.
ACM/IEEE International Conference on Human Robot Interaction (HRI), March 2023.
[pdf] [homepage] [code]

2022

Eliciting Compatible Demonstrations for Multi-Human Imitation Learning
Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh.
Conference on Robot Learning (CoRL), December 2022.
[pdf] [homepage]

What Makes Representation Learning from Videos Hard for Control?
Tony Z. Zhao, Siddharth Karamcheti, Thomas Kollar, Chelsea Finn, Percy Liang.
2nd Workshop on Scaling Robot Learning @ RSS 2022, June 2022.
(Workshop) Best Paper Award Finalist
[pdf]

Shared Autonomy for Robotic Manipulation with Language Corrections
Siddharth Karamcheti*, Raj Palleti*, Yuchen Cui, Percy Liang, Dorsa Sadigh.
Workshop on Learning with Natural Language Supervision (NL-Supervision) @ ACL 2022, May 2022.
[pdf]

2021

ELLA: Exploration through Learned Language Abstraction
Suvir Mirchandani, Siddharth Karamcheti, Dorsa Sadigh.
Conference on Neural Information Processing Systems (NeurIPS), December 2021.
[pdf] [talk] [slides] [code]

LILA: Language-Informed Latent Actions
Siddharth Karamcheti*, Megha Srivastava*, Percy Liang, Dorsa Sadigh.
Conference on Robot Learning (CoRL), November 2021.
[pdf] [homepage] [code] [poster]

On the Opportunities and Risks of Foundation Models
Center for Research on Foundation Models (CRFM) – 100+ authors, directed by Percy Liang.
- Robotics (§2.3): Siddharth Karamcheti (Lead), Annie Chen, Suvir Mirchandani, Suraj Nair, Krishnan Srinivasan, Kyle Hsu, Jeannette Bohg, Dorsa Sadigh, Chelsea Finn.
- Interaction (§2.5): Joon Sung Park, Chris Donahue, Mina Lee, Siddharth Karamcheti, Dorsa Sadigh, Michael Bernstein.
[pdf] [homepage] [workshop] [press]

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning.
Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP), August 2021.
Outstanding Paper Award
[pdf] [talk] [slides] [code] [other coverage]

Targeted Data Acquisition for Evolving Negotiation Agents
Minae Kwon, Siddharth Karamcheti, Mariano-Florentino Cuéllar, Dorsa Sadigh.
International Conference on Machine Learning (ICML), July 2021.
[pdf] [talk] [slides]

Learning Visually Guided Latent Actions for Assistive Teleoperation
Siddharth Karamcheti, Albert J. Zhai, Dylan P. Losey, Dorsa Sadigh.
Learning for Dynamics and Control (L4DC), June 2021.
[pdf] [talk] [slides] [code] [poster]

2020

Learning Adaptive Language Interfaces through Decomposition
Siddharth Karamcheti, Dorsa Sadigh, Percy Liang.
Workshop for Interactive and Executable Semantic Parsing (IntEx-SemPar) @ EMNLP 2020, November 2020.
[pdf] [slides]

Generating Interactive Worlds with Text
Angela Fan*, Jack Urbanek*, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktäschel, Arthur Szlam, and Jason Weston.
Association for the Advancement of Artificial Intelligence (AAAI), February 2020
[pdf] [dataset]

2019

Finding Generalizable Evidence by Learning to Convince Q&A Models
Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, and Kyunghyun Cho.
Empirical Methods in Natural Language Processing (EMNLP), November 2019
[pdf] [blog post] [code]

Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, and Jason Weston.
Empirical Methods in Natural Language Processing (EMNLP), November 2019
[pdf] [dataset]

Improving Grey-Box Fuzzing by Modeling Program Control Flow
Siddharth Karamcheti, Gideon Mann, and David Rosenberg
Workshop on Machine Learning for Software Engineering (ML4SE), June 2019
[pdf] [slides]

Grounding Natural Language Instructions to Semantic Goal Representations for Abstraction and Generalization
Dilip Arumugam*, Siddharth Karamcheti*, Nakul Gopalan, Edward C. Williams, Mina Rhee, Lawson L.S. Wong, and Stefanie Tellex
Autonomous Robots (AuRO), February 2019
[free-to-read pdf]

2018

Adaptive Grey-Box Fuzz Testing with Thompson Sampling
Siddharth Karamcheti, Gideon Mann, and David Rosenberg
11th ACM Workshop on Artificial Intelligence and Security (AISEC), October 2018
Oral Presentation
[pdf] [slides]

2017

Modeling Latent Attention within Neural Networks
Christopher Grimm, Dilip Arumugam, Siddharth Karamcheti, David Abel, Lawson L.S. Wong, and Michael Littman
Preprint
[pdf]

A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting Action-Oriented and Goal-Oriented Instructions
Siddharth Karamcheti, Edward C. Williams, Dilip Arumugam, Mina Rhee, Nakul Gopalan, Lawson L.S. Wong, and Stefanie Tellex
1st Workshop in Language Grounding for Robotics (RoboNLP) @ ACL, August 2017
Best Paper Award
[pdf]

Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities
Dilip Arumugam*, Siddharth Karamcheti*, Nakul Gopalan, Lawson L.S. Wong, and Stefanie Tellex
Robotics: Science and Systems (RSS), June 2017
[pdf]

Blog Posts

The Annotated S4 – Efficiently Modeling Long Sequences with Structured State Spaces
Sasha Rush and Siddharth Karamcheti – January, 2022.
[ICLR blog track] [code] [original paper]

Mistral – A Journey towards Reproducible Language Model Training
Siddharth Karamcheti* and Laurel Orr* – August, 2021.
Team: Jason Bolton, Tianyi Zhang, Karan Goel, Avanika Narayan, Rishi Bommasani, Deepak Narayanan
Advisors: Tatsunori Hashimoto, Dan Jurafsky, Christopher D. Manning, Christopher Potts, Christopher Ré, Percy Liang
[code] [checkpoints] [talk]