Home / News / Details
ZJUI Doctoral Students released the research findings in IJCV, the top Journal of Artificial Intelligence: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in Diverse Open Scenes
Date:07/10/2023 Article:From the research team of Assist Prof. Wang Gaoang Photo:From the research team of Assist Prof. Wang Gaoang

 

Recently, the latest research result from Assist Prof. Wang Gaoang of ZJUI and his research group, which is about a new dataset and new method for cross-view multi-object tracking in open scenes- DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in Diverse Open Scenes was accepted by the International Journal of Computer Vision (IJCV, IF=19.5, CCF A), a top journal in the field of artificial intelligence. The first author of the paper is Hao Shengyu, a ZJUI 2021 doctoral student, and the co-first author is Liu Peiyuan, 23’ Computer Engineering undergraduate of ZJUI (currently a graduate student of  Tsinghua Shenzhen International Graduate School), the corresponding author is Assist Prof. Wang Gaoang of ZJUI, and other co-authors include Jin Kaixun, 22’ Computer Engineering undergraduate of ZJUI, Zhan Yibing, a researcher at Jingdong Explore Academy, Assist Prof. Liu Zuozhu of ZJUI, Prof. Song Mingli from the College of Computer Science and Technology, Zhejiang University and Prof. Jenq-Neng Hwang, IEEE Life Fellow at the University of Washington.

 

 

This paper proposes a novel cross-view multi-object tracking dataset, DIVOTrack, which is acquired from the perspective of a moving camera, has more realistic and diverse scenes, and has more crowded trajectories, and establishes a standardized baseline and benchmarkfor cross-view multi-object tracking. The paper also proposes a novel end-to-end cross-view multi-object tracking method, CrossMOT, which integrates object detection, single-view tracking, and cross-view tracking tasks in a unified model. This should be the first end-to-end model with both object detection and cross-view multi-object tracking capabilities, which can simultaneously learn target detection, single-view and cross-view target features. Based on the above datasets and models, this paper establishes a standardized benchmark for cross-view multi-object tracking evaluation. Experimental results show that the CrossMOT proposed in this paper achieves high cross-view multi-object tracking accuracy and is superior to the current state-of-the-art (SOTA) method in DIVOTrack and other datasets, which is expected to provide favorable support for the application of cross-view multi-object tracking technology in autonomous driving, intelligent monitoring, behavior recognition and other fields.

 

 

Introduction to the paper

 

In recent years, single-view multi-object tracking has been extensively studied and explored. However, due to the limitation of a single viewing angle, it is easy to lose obscured objects in long-term tracking scenarios. Cross-view multi-object tracking can alleviate the above problems, but due to the lack of real scenes and diversified tracking scenarios, and limited number of moving trajectories in the current cross-view tracking dataset, it is difficult for the existing dataset to fully test the effectiveness of cross-view multi-object tracking methods.

 

 

图片

▲ Figure 1 DIVOTrack dataset sample. From left to right, different views of the same scene are represented, where the same characters are shown in the same color and consistent ID numbers.

 

 

To overcome the above difficulties and facilitate future research on cross-view multi-object tracking, the group's research proposes a novel dataset for cross-view multi-object tracking called DIVOTrack. It has the following main characteristics:

 

1. DIVOTrack provides a variety of scenarios. It contains both outdoor and indoor scenes and various surroundings such as streets, shopping malls, buildings, plazas, and public infrastructure.

 

2. DIVOTrack provides a large collection of target trajectories with a focus on crowded environments. There are a total of 1690 single-view trajectories and 953 cross-view trajectories, both of which are significantly larger than the previous cross-view multi-objective tracking dataset.

 

3. DIVOTrack enables researchers to study cross-view multi-object tracking under moving cameras through moving camera acquisition.

 

 

 

图片

▲ Figure 2 Schematic diagram of the CrossMOT framework. CrossMOT is an end-to-end cross-perspective multi-objective tracking method.

 

 

In addition to the proposed DIVOTrack dataset, the paper also proposes an end-to-end cross-perspective multi-objective tracking baseline framework called CrossMOT. CrossMOT is a unified framework for combining object detection and cross-view multi-object tracking, which uses an integrated embedding model for object detection, single-view tracking, and cross-view tracking. CrossMOT uses decoupled multi-head embedding to learn object detection, single-view re-identification (Re-ID), and cross-view re-identification features simultaneously. To solve the conflict of cross-view and single-view embedding, local perception and conflict-free losses are used to improve collaborated embedding. Specifically, single-view embeddings focus on learning time continuity, while cross-view embeddings focus on the unchanging appearance of objects across views.

 

Using the dataset, benchmark and baseline proposed in this study, researchers can fairly compare cross-view multi-object tracking methods in the future, which is expected to promote the development of cross-view tracking technology.

 

The research was supported by the National Natural Science Foundation of China, the Fundamental Research Funds for the Central Universities, and the National Key Research and Development Program.

 

 

About the author

 

图片

 

 

Hao Shengyu, doctoral students jointly trained by ZJUI and the College of Computer Science and Technology, Zhejiang University, whose supervisor is Assist Prof. Wang Gaoang, and his research direction is multi-object tracking. At present, he has published many papers in IJCV, TMM, CVIU, CVPR and other journals and conferences. He has won the title of Excellent Graduate Student in 2022 and is a reviewer for journals and conferences such as KBS, ICASSP, and PRCV.

 

 

图片

 

 

Liu Peiyuan, 23’ Computer Engineering undergraduate of ZJUI, is a graduate student at Tsinghua Shenzhen International Graduate School currently, where his research interests include long sequence time-series forecasting, Gaussian process and fine-tuning of large language models.

 

 

图片

 

 

Assist Prof. Wang Gaoang, Ph.D. supervisor of ZJUI, received his bachelor's, master's and doctoral degrees from Fudan University, University of Wisconsin-Madison, and University of Washington, respectively. His research interests include computer vision, multi-modal learning, knowledge transfer, generative models, and time series modeling, and he has published more than 50 papers in internationally renowned journals and international conferences, including IJCV, TIP, TMM, TCSVT, TVT, CVPR, ICCV, ECCV, ACM MM, IJCAI, etc.

 

 

 

Paper Information

 

Title: DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes


Link: https://arxiv.org/abs/2302.07676

 

Github link: https://github.com/shengyuhao/DIVOTrack

 

The CVNext lab of Wang Gaoang’s team:https://cvnext.github.io/

 

回到顶部