Peidong Liu

[Google Scholar]

Advanced Algorithm Engineer in DJI Automotive

Shenzhen, Guangdong, China.

moc.liamg@uil.nodrep :liamE

Biography

I am currently a lead for the visual-language-action model (VLA) at the DJI Automotive Perception Group and especially focus on fine-tuning or prompting VLA to address the challenges posed by long-tailed scenarios. Before that, I was primarily responsible for the Bird's Eye View (BEV) lane detection and large-scale multimodal retrieval systems. If you are interested in an internship opportunity, please feel free to drop me an email.

I obtained my M.S. in Computer Science from Tsinghua University in 2022, as an outstanding graduate. I have been fortunate to closely work with Prof. Xiaodan Liang at Sun Yat-sen University, Dr. Hang Xu at Huawei Noah's Ark Lab, Dr. Litong Feng and Dr. Xinjiang Wang at SenseTime Research. I received my B.S. in Software Engineering from Sun Yat-sen University summa cum laude in 2019. My research interest lies in computer vision and visual-language model.

News

Publications

* denotes equal contribution.

CLIP4Drive: Pioneering Tail Data Retrieval for Autonomous Driving

In Submission

SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation [PDF]

Yanjie Li, Sen Yang, Peidong Liu, Shu-Tao Xia

European Conference on Computer Vision (ECCV), 2022

NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer [PDF]

Yunxiao Wang, Yanjie Li, Peidong Liu, Tao Dai, Shu-Tao Xia

European Conference on Computer Vision (ECCV), 2022

Multi-task Ranking with User Behaviors for Text-Video Search [PDF]

Peidong Liu, Dongliang Liao, Jinpeng Wang, Yangxin Wu, Gongfu Li, Shu-Tao Xia, Jin Xu

International World Wide Web Conferences (WWW, CCF-A) Companion, 2022

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search [PDF] [Talk] [Poster] [PPT] [Code]

Peidong Liu*, Gengwei Zhang*, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

International Conference on Learning Representations (ICLR), 2021.

WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations [PDF]

Peidong Liu*, Zibin He*, Xiyu Yan*, Yong Jiang, Shu-Tao Xia, Feng Zheng, Maowei Hu

ACM International Conference on Multimedia (ACM MM, CCF-A) Oral, 2021.

Visual Privacy Protection via Mapping Distortion [PDF] [Code]

Yiming Li*, Peidong Liu*, Yong Jiang, Shu-Tao Xia

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

Deep Flow Collaborative Network for Online Visual Tracking [PDF]

Peidong Liu, Xiyu Yan, Yong Jiang, Shu-Tao Xia

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering [PDF]

Changzhou Li, Yao Lu, Junfeng Wu, Yongrui Zhang, Zhongzhou Xia, Tianchen Wang, Dantian Yu, Xurui Chen, Peidong Liu, Junyu Guo

International World Wide Web Conferences (WWW, CCF-A) Companion, 2018

Selected Awards

Research Experience in both Academic and Industry

2022.07 - till now Perception Group, DJI Automotive
Advanced Computer Vision Algorithm Engineer
  • Currently, I am responsible for applying Visual-Language-Action (VLA) models in autonomous driving from scratch, which encompasses an extensive survey of open-source data and models, the establishment of data annotation and model fine-tuning pipeline. This latter includes the design of prompts, Lora fine-tuning, and the utilization of Deepspeed for multi-node training. By leveraging both open-source and proprietary datasets, the model has been fine-tuned to possess capabilities in perception, decision-making, and planning. These models are now capable of generating human-like trajectories even in long-tail scenarios
  • Developed the multimodal large model in image-text retrieval, focusing on mining long-tail data of user interest within massive video databases, with the aim of empowering autonomous driving applications. Specifically, I leverage LLM (Large Language Model) and Diffusion Model to generate synthetic data, augmenting the existing dataset and enhancing the model's performance
  • Developed multiple Bird-Eye-View (BEV) lane detection solutions, including temporal BEV, fisheye BEV and road topology, aiming to facilitate the implementation of various solutions in mass production projects

    (I achieved great performance as a result of my accomplishments)

  • Optimized Closed-loop lane detection data recycle process, which involves the entire cycle of data collection, feedback, filtering, reconstruction, annotation, etc. This comprehensive approach has greatly improved data recycle efficiency by over 100% through streamlining coordination and collaboration among several modules (I was awarded the 2023 Annual Efficiency Vanguard Award at DJI Automotive due to my outstanding contributions)

  • 2019.09 - 2022.06 Department of Computer Science and Technology, Tsinghua University
    Master Student
    Supervisor: Shu-Tao Xia
  • Proposed Memory Flow Distillation, called MFD, for video semantic segmentation. MFD utilizes weakly-supervised training pattern, optical flow and distillation to alleviate two issues: fine-annotation scarcity and low inference speed. For PSPNet MobileNetV2, MFD increases the performance by 10.24% mIoU and reaches a real-time speed (ACM MM2021 Oral)
  • Proposed a Flow Collaborative Network, called DFCNet, for online visual tracking. DFCNet only runs the complex feature network on sparse keyframes, which is selected by raised adaptive keyframe scheduling. DFCNet maximizes the benefits of both feature appearance and temporal information and reaches 30% faster than baseline without compromising accuracy (ICASSP2020)
  • 2021.06 - 2022.05 Search Application Department, WeChat Group, Tencent
    Computer Vision Algorithm Engineer Intern
  • To address the challenges of low click-through rates and completion rates in video retrieval for WeChat Channel, as well as the issue of imprecise query-item matching, we defined a new problem: multi-target ranking for video retrieval. By extracting 800k query-document pairs from user interaction logs and employing a multimodal fusion model combined with the MMOE framework as the baseline, the research modeled multiple objectives and achieved a 3% improvement in the average AUC-ROC for each objective
  • 2020.04 - 2021.03 Noah's Ark Lab, Huawei
    Research Intern
    Mentor: Xiaodan Liang, Hang Xu, Bochao Wang
  • Proposed an effective convergence-simulation driven evolutionary search algorithm, called CSE-Autoloss, for object detection loss function discovery, which achieves 20x speedup via progressive convergence-simulation modules (ICLR2021)
  • 2019.07 - 2019.09 Y-Tech AI Lab, Beijing Kuaishou Technology Ltd.
    AI Intern
  • Improved face parsing task with landmarks by around 2% in accuracy on baseline model UNet
  • 2018.11 - 2019.06 Fundamental Technique Research Group, SenseTime Research
    Research Intern
    Mentor: Litong Feng (Senior Researcher, Ph.D.)
  • Solely responsible for building the entire pipeline for converting pytorch models to caffe models, including models for classification (Resnet, Inception Resnet series) and Object Detection (SSD, Faster Rcnn), etc.
  • 2018.03 - 2018.05 NUS-Tsinghua Center for Extreme Search(NExT++), NUS, Singapore
    Research Assistant
    Mentor: Zhaoyan Ming (Ph.D., Team Head, NExT++)
  • Implemented an algorithm to classify Southeast Asian food with complex names and meanings
  • 2017.10 - 2018.02 Smart Mobile Computing Lab, Advanced Networking and Computing Systems Institute, SYSU
    Research Assistant
    Mentor: Xu Chen (Professor, School of Data and Computer Science, SYSU)
  • Engaged in modeling 30GB articles of WeChat Moment with effective structural features
  • Applied Logistic Regression, Random Forest and GBDT to predict the information growth
  • 2017.07 - 2017.10 Natural Language Processing Group, Guangdong Province Key Laboratory of Computational Science, SYSU
    Research Assistant
    Mentor: Yao Lu (Professor, School of Data and Computer Science, SYSU)
  • Participated in text analysis and text mining of medical scientific literature, including preprocessing, word vector representation with Word2Vec, vector dimension reduction with PCA, keywords obtained via TF-IDF, topic number analysis via AP algorithm and article topics obtained via LDA
  • Realized parallelization with Spark
  • Academic Service

    Conference Reviewer for AAAI 2022, WWW 2022.

    Education

    2019.09 - 2022.06, master student of Department of Computer Science and Technology at Tsinghua University

    2015.09 - 2019.06, undergraduate student of School of Computer Science and Engineering at Sun Yat-sen University, rank 3/119

    2018.01 - 2018.05, exchange student of School of Computing at National University of Singapore, research intern in NExT++ lab

    Last updated on Dec 2024. There are visitors, with times till now.