Welcome to CogModal Group! (Cognition-Inspired Cross-Modal Intelligence Group). A long-standing challenge of both cognitive science and artificial intelligence is understanding how humans manage to learn knowledge and solve problems from multi-modal information, e.g. text, image, video, and audio, with relatively little supervised instruction. Cognitive science explores empirical evidence to reveal why humans can understand the world and how they realize it. From the artificial intelligence view, our CogModal group has a great interest in developing human-like AI techniques from the implications of inherent mechanisms in cognitive science. To this end, our members mainly focus on five research topics that correspond to human basic abilities in multi-modal information cognition: multi-modal information Representation, Memory, Reasoning, Generation, and Accumulation. These research topics cover a wide range of tasks and applications including Cross-modal Information Retrieval, Referring Expression, Visual Question Answering, Image/Video Captioning, Text-based Image Generation and Vision-Language Navigation etc.


🌟 You are the th visitor of our group!

Join us!

Albert Einstein has said ‘the eternal mystery of the world is its comprehensibility.’(爱因斯坦曾有这样一句话:“世界的永恒之谜是它是可理解的”) We are looking for you (self-motivated graduate candidates, undergraduate students and visiting students) to join us to explore the answer together!
🌟 Email: yujing02 at iie dot ac dot cn

🌟 Zhihu link: Cognition-Inspired Cross-Modal Intelligent Articles

News

 
 
 
 
 
Jing Yu gave an invited talk.
Dec 2023 – Dec 2023
Jing Yu gave an invited talk “Large Multi-modal Model Technologies and Applications in Rural Vitalization” in 2023 CCF China Blockchain Technology and Application Summit Forum . The news is available here.
 
 
 
 
 
1 paper has been accepted by AAAI, 2024 !
Dec 2023 – Dec 2023
Yuanmin Tang, Jing Yu*, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu. Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval. AAAI, 2024. The paper is available here.
 
 
 
 
 
1 paper has been accepted by Blockchain, 2023 !
Nov 2023 – Nov 2023
Zhiqi Lei, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Raymond Choo, Efficiency-Enhanced Blockchain-Based Client Selection in Heterogeneous Federated Learning. Blockchain, 2023.
 
 
 
 
 
1 paper has been accepted by ICDM, 2023 !
Nov 2023 – Nov 2023
Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu. BDVFL: Blockchain-based Decentralized Vertical Federated Learning. ICDM, 2023 (CCF-B)
 
 
 
 
 
Jing Yu organized a forum.
Aug 2023 – Aug 2023
Jing Yu organized the forum about “Computer science and technology workers practice the spirit of scientists, how to make steady progress”, the news is available here
 
 
 
 
 
Jing Yu invited Professor Bang Liu to give a talk.
Aug 2023 – Aug 2023
Jing Yu invited Professor Bang Liu to give a talk about “Natural Language Processing for Materials Science”.
 
 
 
 
 
Jing Yu invited Professor Qi Wu to give a talk.
Jun 2023 – Jun 2023
Jing Yu invited Professor Qi Wu to give a talk about “GPT for Vision-and-Language Navigation, VLN”.
 
 
 
 
 
1 paper has been accepted by TMM 2023 !
May 2023 – May 2023
Jiamin Zhuang, Jing Yu(corresponding author), Yang Ding, Xiangyan Qu, Yue Hu. Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment, IEEE Transactions on Multimedia (TMM), (Impact factor: 6.513). The paper is available here
 
 
 
 
 
Jing Yu gave an invited talk.
Apr 2022 – Apr 2022
Jing Yu gave an invited talk “ MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering” in MSRA CVPR 2022 workshop . The slides are available here. The workshop news is available here. The photos are available here
 
 
 
 
 
Yang Ding gave a poster presentation !
Apr 2022 – Apr 2022
Yang Ding gave an poster presentation of the paper “ MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering” in MSRA CVPR 2022 workshop. The photos are available here.
 
 
 
 
 
Welcome Siyuan Feng (2022 Master student) join our team !
Apr 2022 – Present
 
 
 
 
 
Prof. Jing Yu gave an invited talk !
Mar 2022 – Mar 2022
  • Jing Yu gave an invited online talk “ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification” in the Information Engineering University. The slides are available here.
 
 
 
 
 
1 paper has been accepted by CVPR 2022 !
Mar 2022 – Mar 2022
  • Yang Ding, Jing Yu(corresponding author), Bang Liu, Yue Hu, Mingxin Cui, Qi Wu. MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering,CVPR, 2022.
 
 
 
 
 
Prof. Jing Yu gave an invited talk !
Mar 2022 – Mar 2022
  • Prof. Jing Yu gave an invited talk in Shanghai University about “Introduction of Scientific Research and Paper writing”. The slides are available here. The news about this talk can be found here.
 
 
 
 
 
1 paper has been accepted by WWW 2022 !
Feb 2022 – Present
  • ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification * WWW videos
 
 
 
 
 
1 paper has been accepted by ICASSP 2022 !
Dec 2021 – Present
  • WLinker: Modeling Relational Triplet Extraction as Word Linking
 
 
 
 
 
1 paper has been accepted by Knowledge-Based Systems (KBS) 2021 !
Dec 2021 – Present
  • APER: AdaPtive Evidence-driven Reasoning Network for machine reading comprehension with unanswerable questions.
  • Impact factor: 5.921 !
 
 
 
 
 
1 paper has been accepted by ICML 2021 !
Jul 2021 – Present
  • Evolving Attention with Residual Convolutions
 
 
 
 
 
1 paper has been accepted by IJCAI 2021 !
Apr 2021 – Present
  • CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation
 
 
 
 
 
Jing Yu gave an invited talk.
Apr 2021 – Present
  • Jing Yu gave an invited talk “Towards Cognition-Inspired Vision and Language Methods”in the CCF YOCSEF Xi`an Forum “How does Vision and Language 1+1>2?”
  • The slides are available here.
 
 
 
 
 
1 paper has been accepted by EACL 2021 !
Feb 2021 – Present
  • Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees.
 
 
 
 
 
1 paper has been accepted by Information Fusion, 2021 !
Feb 2021 – Present
  • DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation.
  • Impact factor: 12.975 !
 
 
 
 
 
1 paper has been accepted by ICASSP 2021 !
Jan 2021 – Present
  • MCR-NET: A Multi-Step Co-Interactive Relation Network for Unanswerable Questions on Machine Reading Comprehension.
 
 
 
 
 
Welcome new students !
Jan 2021 – Present
Welcome Yaochen Ren (2021 PhD student) , Xinjie Lin (2018 PhD student), Minghao Jiang (2018 PhD student) and Yu Wang (2017 PhD student) join our team!
 
 
 
 
 
Jing Yu gave an invited talk.
Oct 2020 – Present
  • Jing Yu will give a talk of “Deep Reasoning for Visual Question Answering” in Shanghai University & online. The talk is on 15/10/2020, 10:00~12:00 a.m., UTC+8.
  • More information about the talk is avaible here.
 
 
 
 
 
1 paper has been accepted by IEEE Transactions on Image Processing !
Oct 2020 – Present
  • Learning Dual Encoding Model for Adaptive Visual Understanding in Visual Dialogue.
  • Impact factor: 9.34 !
 
 
 
 
 
1 paper has been accepted by ACM MM 2020 !
Aug 2020 – Present
  • KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue.
 
 
 
 
 
1 paper has been accepted by Pattern Recognition Letters (PRL) !
Aug 2020 – Present
  • Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention.
 
 
 
 
 
1 paper has been accepted by Pattern Recognition !
Jul 2020 – Present
  • Cross-modal knowledge reasoning for knowledge-based visual question answering.
  • Impact factor: 7.196 !
 
 
 
 
 
1 paper has been accepted by Knowledge-based Systems !
Jul 2020 – Present
  • Cross-modal learning with prior visual relation knowledge.
  • Impact factor: 5.921 !
 
 
 
 
 
1 paper has been accepted by ICIP 2020 !
May 2020 – Present
  • Prior Visual Relationship For Visual Question Answering.
 
 
 
 
 
Jing Yu gave an invited talk.
May 2020 – Present
  • Jing Yu gave an invited talk of “Deep Learning based Visual Question Answering” in Shanghai University, Shanghai, China.
  • The slides are available here.
 
 
 
 
 
2 papers have been accepted by IJCAI2020 !
Apr 2020 – Present
  • Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering. 代码链接

  • DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue.

  • accept rate: 592/4717 = 12.6%

 
 
 
 
 
1 paper has been accepted by Information Fusion !
Mar 2020 – Present
  • Multimodal feature fusion by relational reasoning and attention for visual question answering
  • Impact factor: 12.975
 
 
 
 
 
1 paper has been accepted by IEEE Transactions on Multimedia !
Jan 2020 – Present
  • Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-modal Retrieval
  • Impact factor: 6.051
 
 
 
 
 
2 paper has been accepted by AAAI 2020 !
Nov 2019 – Present
  • Deep Visual Understanding Like Humans: An Adaptive Dual Encoding Model for Visual Dialogue.
  • Unsupervised Learning of Deterministic Dialogue Structure with Edge-Enhanced Graph Auto-Encoder
 
 
 
 
 
1 paper has been accepted by KSEM 2019 !
May 2019 – Present
  • Semantic Modeling of Textual Relationships in Cross-modal Retrieval.