Foundation models meet visualizations: Challenges andopportunities

Weikai Yang; Mengchen Liu; Zheng Wang; Shixia Liu

doi:10.1007/s41095-023-0393-x

| Sign up

PDF (3.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Review Article | Open Access

Foundation models meet visualizations: Challenges andopportunities

Weikai Yang^¹, Mengchen Liu^², Zheng Wang^¹, Shixia Liu^¹()

1School of Software, Tsinghua University, Beijing 100084, China

2Microsoft, Redmond 98052, USA

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Recent studies have indicated that foun-dation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adap-tability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides these intersections into two research areas: visualization for foundation model (VIS4FM) and foundation model for visualization (FM4VIS). In terms of VIS4FM, we explore the primary role of visualizations in understanding, refining, and eva-luating these intricate foundation models. VIS4FM addresses the pressing need for transparency, explai-nability, fairness, and robustness. Conversely, in terms of FM4VIS, we highlight how foundation models can be used to advance the visualization field itself. The intersection of foundation models with visualizations is promising but also introduces a set of challenges. By highlighting these challenges and promising oppor-tunities, this study aims to provide a starting point for the continued exploration of this research avenue.

Keywords

visualization artificial intelligence (AI)machine learning foundation models visualization for foundation model (VIS4FM)foundation model for visualization (FM4VIS)

References

[1]

Bommasani,

; Hudson,

D. A.

; Adeli,

; Altman,

; Arora,

; von Arx,

; Bernstein,

M. S.

; Bohg,

; Bosselut,

; Brunskill,

; et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.

Google Scholar

[2]

Devlin,

; Chang,

M. W.

; Lee,

; Toutanova.

BERT: Pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186, 2019.

[3]

Dosovitskiy,

; Beyer,

; Kolesnikov,

; Weissenborn,

; Zhai,

; Unterthiner,

; Dehghani,

; Minderer,

; Heigold,

; Gelly,

; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations, 2021.

[4]

Wang,

; Dai,

; Chen,

; Huang,

; Li,

; Zhu,

; Hu,

; Lu,

; Li,

; et al. InternImage: Exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14408-14419, 2023.

Crossref

[5]

Radford,

; Kim,

J. W.

; Hallacy,

; Ramesh,

; Goh,

; Agarwal,

; Sastry,

; Askell,

; Mishkin,

; Clark,

; et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, 8748-8763, 2021.

[6]

Brown,

T. B.

; Mann,

; Ryder,

; Subbiah,

; Kaplan,

; Dhariwal,

; Neelakantan,

; Shyam,

; Sastry,

; Askell,

; et al. Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems, 1877-1901, 2020.

[7]

Ouyang,

; Wu,

; Jiang,

; Almeida,

; Wainwright,

; Mishkin,

; Zhang,

; Agarwal,

; Slama,

; Ray,

; et al. Training language models to follow instructions with human feedback. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 27730-27744, 2022.

[8]

OpenAI; Achiam,

; Adler,

; Agarwal,

; Ahmad,

; Akkaya,

; Aleman,

F. L.

; Almeida,

; Altenschmidt,

; Altman,

GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.

Google Scholar

[9]

Eloundou,

; Manning,

; Mishkin,

; Rock,

GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023.

Google Scholar

[10]

Liu,

; Wang,

; Liu,

; Zhu,

Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics Vol. 1, No. 1, 48-56, 2017.