Home

Alexander Toshev

Research Scientist and Manager

toshev at apple / alex.t.toshev at gmail

Intro

I am a Research Scientist and Manager at Apple ML Research, leading research efforts in Multimodal Foundational Models and Embodied AI.

Highlights (in reverse chronological order):

2024: Co-Lead of Apple's Multimodal LLM work MM1:
2023: DFN is the best large scale CLIP image encoder in the market.
2022: Co-desigined and co-led Google Robotics SayCan effort, initiating a new field of Foundational Models for Robotics Decision Making and getting best paper award at CoRL 2022.
2017-2021: Initiator and co-lead of the robot navigation effort (10+ FTEs) within Google Robotics, resulting in systems and published work across Object-driven Robot Navigation, Social Robot Navigation, Mobile Manipulation.
2015: Co-authored Show and Tell paper, initiated the new field of Vision-Language models; top most cited papers ever from CVPR 2015.
2013-2014: Co-authored DeepPose and Object Detection papers; first deep neural network work for localization in image, top most cited papers from CVPR 2014.

Academic Activities

Generative Models for Decision Making, May 2024.

Symposium on Social Navigation Benchmarking, Feb 2022.

CVPR'20, CVPR'21, CVPR'22 , CVPR'23 Workshop on Embodied AI

CVPR' 19 Workshop on Deep Learning for Semantic Visual Navigation

Area Chair, CVPR 2017, 2020, 2023, 2024; ECCV 2020, 2022, 2024; NeurIPS 2021, 2023; ICLR 2023, 2024.

Program committee, CVPR, ICCV, ECCV, NIPS

Recent Talks

Georgia Tech / Google Robotics Workshop, May 2021.

iGibson Sim2Real Challenge, Embodied AI Workshop, CVPR 2020.

Robot Learning Workshop, Robot Learning Workshop, NSF & Lehigh University, 2019.

Publications

Multimodal Foundation Models

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training, In Submission, March 2024.

Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind, Value Function Estimation using Conditional Diffusion Models for Control, In Submission, 2024.

Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin, Scalable Pre-training of Large Autoregressive Image Models, In Submission, 2024.

Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev, Large Language Models as Generalizable Policies for Embodied Tasks, ICLR 2024.

Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander Toshev, Vaishaal Shankar, Data Filtering Networks, ICLR 2024.

Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev, On Robustness in Multimodal Learning, ICML, 2023.

Anthony Francis, Claudia Perez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Haresh Karnan, Tsang-Wei Edward Lee, Luis Manso, Reuth Mirsky, Soren Pirk, Phani Teja Singamaneni, Peter Stone, Ada Taylor, Peter Trautman, Nathan Tsoi, Marynel Vazquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Roberto Martin-Martin, and Alexander Toshev, Benchmarking Robot Social Navigation across Academia and Industry, Symposium on HRI in Academia and Industry, March, 2023.

Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang, STAIR: Learning Sparse Text and Image Representation in Grounded Tokens, EMNLP 2023.

Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Wang, Alexander Toshev, Jon Shlens, Perceptual Grouping in Vision-Language Models, ICCV, 2023.

Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind, GAUDI: A Neural Architect for Immersive 3D Scene Generation, Neurips, 2022.

Robotics

M. Dietke, et al., Retrospectives on Embodied AI Workshop, 2022, Position Paper.

M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, Ch. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. Jauregui Ruano, K. Jeffrey, S. Jesmonth, N. J Joshi, R. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. Reyes, P. Sermanet, N. Sievers, Cl. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, M. Yan, Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, CoRL, 2022, (oral), Special Innovation Award.

Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone, Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation, IROS, 2022.

Soeren Pirk, Edward Lee, Xuesu Xiao, Anthony Francis, Leila Takayama, Alexander Toshev, A Protocol for Evaluating Social Navigation Policies, ICRA Workshop on Social Robot Navigation: Advances and Evaluation, 2022.

Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter, Value Function Spaces, Skill-Centric State Abstractions for Long-Horizon Reasoning, ICLR, 2022.

Ayzaan Wahid, Austin Stone, Kevin Chen, Brian Ichter, Alexander Toshev, Learning Object-conditioned Exploration using Distributed Soft Actor Critic, CoRL 2020.

Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans, Objectnav revisited: On evaluation of embodied agents navigating to objects, position paper, 2020

Fei Xia, Chengshu Li, Or Litany, Roberto Martin-Martin, Alexander Toshev, Silvio Savarese, ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, 2020.

Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari, Modeling Long-horizon Tasks as Sequential Interaction Landscapes, CoRL 2020.

Fei Xia, William Chen, Chengshu Li, Priya Kasimbeg, Micael Tchampi, Alexander Toshev, Roberto Martin-Martin, Silvio Savarese, Interactive Gibson: A Benchmark in Navigation in Cluttered Environments, RA-Letters, 2020

Kuan Fang, Alexander Toshev, Silvio Savarese, Li Fei-Fei, Scene Memory Transformer for Embodied Agents in Long Horizon Tasks, CVPR 2019.

Ayzaan Wahid, Alexander Toshev, Marek Fiser, Edward Lee, Long Range Neural Navigation Policies for the Real World, IROS 2019.

Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, James Davidson, Visual Representations for Semantic Target Driven Navigation, ICRA 2019.

Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine, Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control, CVPR 2018.

Language and Vision

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge, IEEE Transactions on PAMI, 2017.

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L Yuille, Kevin Murphy, Generation and Comprehension of Unambiguous Object Descriptions, CVPR 2016.

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator, CVPR 2015 (oral, 3100+ citations).

Human Pose Estimation

A.J. Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo, Adversarial Generative Grammars for Human Activity Prediction, ECCV 2020.

George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy, Towards accurate multi-person pose estimation in the wild, CVPR 2017 (best on human pose estimation on COCO).

Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly, Chained Predictions Using Convolutional Neural Networks, ECCV 2016.

Alexander Toshev, Christian Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR 2014 (oral, 1300+ citations).

Benjamin Sapp, Alexander Toshev, Ben Taskar, Cascaded Models for Articulated Pose Estimation, ECCV 2010.

Object Detection

Etienne Pot, Alexander Toshev, Jana Kosecka, Self-supervisory Signals for Object Discovery and Detection, 2018.

Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov, Scalable Object Detection Using Deep Neural Networks, CVPR 2014 (700+ citations).

Christian Szegedy, Alexander Toshev, Dumitru Erhan, Deep Neural Networks for Object Detection, NIPS 2013 (800+ citations).

Misc

AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S Ryoo, Evolving Space-Time Neural Architectures for Videos, In Submission, 2019.

Yair Movshovitz-Attias, Alexander Toshev, Thomas K Leung, Sergey Ioffe, Saurabh Singh, No Fuss Distance Metric Learning via Proxies, ICCV 2017.

Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei, The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition, ECCV 2016.

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe, Deep Convolutional Ranking for Multi-label Image Annotation, ICLR 2013

Alexander Toshev, Philippos Mordohai, Ben Taskar, Detecting and Parsing Architecture at City Scale from Range Data, CVPR 2010.

Alexander Toshev, Ben Taskar, Kostas Daniilidis, Object Detection via Boundary Structure Segmentation, CVPR 2010.

Alexander Toshev, Ameesh Makadia, Kostas Daniilidis, Shape-based object recognition in videos using 3D synthetic object models, CVPR 2009.

Alexander Toshev, Jianbo Shi, Kostas Daniilidis, Image Matching via Saliency Region Correspondences, CVPR 2007 (oral).

Alexander Toshev, Submodular Function Minimization, University of Pennsylvania, 2010.

Patents

Controlling Agents using Scene Memory Data, Kuan Fang, Alexander Toshev, US Patent, 11,842,277, 2023.

Update of Local Features Model Based on Corrections to Robot Actions, Krishna Shankar, Nicholas Hudson, Alexander Toshev, US Patent 11,640,517, 2023.

Distance Metric Learning Using Proxies, Yair Movshovitz-Attias, Thomas Leung, Sergey Ioffe, Saurabh Singh, Alexander Toshev, 10,387,749, 2019.

Generating natural language descriptions of images, Samy Bengio, Oriol Vinyals, Alexander Toshev, Dumitru Erhan, US Patent 9,858,524, 2018.

Automatic translation of digital graphic novels, Greg Don Hartrell, Debajit Ghosh, Matthew William Vaughan-Vail, John Michael Rivlin, US Patent 9,881,003, 2018.

Sublinear time classification via feature padding and hashing, Sergey Ioffe, Alexander Toshev, US Patent 9,940,552, 2018.

Ranking approach to train deep neural nets for multilabel image annotation, Yunchao Gong, King Hong Thomas Leung, Alexander Toshev, Sergey Ioffe, US Patent 9,552,549, 2017.

Object detection using deep neural networks, Christian Szegedy, Dumitru Erhan, Alexander Toshev, US Patent 9,275,308, 2016.

System and method for using segmentation to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, US Patent 9,483,701, 2016.

Object recognition, Alexander Toshev, King Hong Thomas Leung, Jiwoong Jack Sim, US Patent 8,942,468, 2015.

Perceptually-driven representation for object recognition, Alexander Toshev, Jay Yagnik, Vivek Kwatra, US Patent 9,008,356, 2015.

Discriminitive learning for object detection, Dragomir Anguelov, Alexander Toshkov Toshev, Deva K Ramanan, Xiangxin Zhu, US Patent 9,098,741, 2015.

System and method for exploiting segment co-occurrence relationships to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, Poonam Suryanarayan, US Patent 8,768,048, 2014.

Segmentation-based feature pooling for object models, Alexander Toshev, Jay Yagnik, Vivek Kwatra, , US Patent 8,467,607, 2013.

Interns

Andrew Szot, GaTech

Kanchana Ranasinghe, Stony Brook, co-advised with Jon Shlens

Dhruv Shah, Student at UC Berkeley, co-advised with Brian Ichter

Fei Xia, Robotics @ Google

Joe Campbell, Postdoc at CMU

Chengshu Li, Student at Stanford University

Kevin Chen, Apple

Fereshteh Sadeghi, DeepMind

Arsalan Mousavian, NVidia Robotics, co-advised with Jana Kosecka

Oana-Maria Camburu, Research Fellow at UCL.

Georgia Gkioxari, Assist. Prof. at Caltech, co-advised with Navdeep Jaitly

Andre Araujo, Google, co-advised with Sergey Ioffe

Jonathan Krause, Google, co-advised with Howard Zhou

Kota Yamaguchi, Assist. Prof. at Tohoku University

Ling-Ling Tao, Facebook AI

Yunchao Gong, Verkada

Jack Sim, Waymo, co-advised with Thomas Leung