Sort:
Open Access Research Article Issue
Mindstorms in natural language-based societies of mind
Computational Visual Media 2025, 11(1): 29-81
Published: 28 February 2025
Abstract PDF (96.7 MB) Collect
Downloads:25

Inspired by Minsky’s Society of Mind, Schmidhuber’s Learning to Think, and other more recent works, this paper proposes and advocates for the concept of natural language-based societies of mind (NLSOMs). We imagine these societies as consisting of a collection of multimodal neural networks, including large language models, which engage in a “mindstorm” to solve problems using a shared natural language interface. Here, we work to identify and discuss key questions about the social structure, governance, and economic principles for NLSOMs, emphasizing their impact on the future of AI. Our demonstrations with NLSOMs—which feature up to 129 agents—show their effectiveness in various tasks, including visual question answering, image captioning, and prompt generation for text-to-image synthesis.

Open Access Research Article Issue
Full-duplex strategy for video object segmentation
Computational Visual Media 2023, 9(1): 155-175
Published: 18 October 2022
Abstract PDF (7.9 MB) Collect
Downloads:42

Previous video object segmentation appro-aches mainly focus on simplex solutions linking appearanceand motion, limiting effective feature collaboration between these two cues. In this work, we study anovel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage. Specifically, we introduce a relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings, we adopt a bidirectional purification module after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur and occlusion), and compares well to leading methods both for video object segmentation and video salient object detection. The project is publicly available at https://github.com/GewelsJI/FSNet.

Total 2
1/11GOpage