Welcome to CogModal Group! (Cognition-Inspired Cross-Modal Intelligence Group). A long-standing challenge of both cognitive science and artificial intelligence is understanding how humans manage to learn knowledge and solve problems from multi-modal information, e.g. text, image, video, and audio, with relatively little supervised instruction. Cognitive science explores empirical evidence to reveal why humans can understand the world and how they realize it. From the artificial intelligence view, our CogModal group has a great interest in developing human-like AI techniques from the implications of inherent mechanisms in cognitive science. To this end, our members mainly focus on five research topics that correspond to human basic abilities in multi-modal information cognition: multi-modal information Representation, Memory, Reasoning, Generation, and Accumulation. These research topics cover a wide range of tasks and applications including Cross-modal Information Retrieval, Referring Expression, Visual Question Answering, Image/Video Captioning, Text-based Image Generation and Vision-Language Navigation etc.
🌟 You are the th visitor of our group!
🌟 Zhihu link: Cognition-Inspired Cross-Modal Intelligent Articles
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering. 代码链接
DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue.
accept rate: 592/4717 = 12.6%