AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models

AI Group, WeBank Co., Ltd, Shenzhen 518000, China
Show Author Information

Abstract

The springing up of large language models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (e.g., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to 1) illustrate a novel perspective to review existing PE methods within the well-established communication theory framework, 2) facilitate a better/deeper understanding of developing trends of existing PE methods used in three typical tasks, and 3) shed light on promising research directions for future PE methods.

References

[1]
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 159.
[2]
OpenAI. GPT-4 technical report. arXiv: 2303.08774, 2023. https://arxiv.org/abs/2303.08774, Jul. 2024.
[3]
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: Open and efficient foundation language models. arXiv: 2302.13971, 2023. https://arxiv.org/abs/2302.13971, Jul. 2024.
[4]

Cheng K M, Li Z Y, Li C, Xie R J, Guo Q, He Y B, Wu H Y. The potential of GPT-4 as an AI-powered virtual assistant for surgeons specialized in joint arthroplasty. Annals of Biomedical Engineering , 2023, 51(7): 1366–1370. DOI: 10.1007/s10439-023-03207-z.

[5]

Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: An analysis of multiple clinical and research scenarios. Journal of Medical Systems , 2023, 47(1): Article No. 33. DOI: 10.1007/s10916-023-01925-4.

[6]

George A S, George A S H. A review of ChatGPT AI’s impact on several business sectors. Partners Universal International Innovation Journal , 2023, 1(1): 9–23. DOI: 10.5281/zenodo.7644359.

[7]

Liu P F, Yuan W Z, Fu J L, Jiang Z B, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys , 2023, 55(9): 195. DOI: 10.1145/3560815.

[8]

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog , 2019, 1(8): Article No. 9.

[9]
Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A, Wu Y X, Miller A. Language models as knowledge bases? In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp.2463–2473. DOI: 10.18653/v1/D19-1250.
[10]
Schick T, Schütze H. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proc. the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Apr. 2021, pp.255–269. DOI: 10.18653/v1/2021.eacl-main.20.
[11]

Jiang Z B, Xu F F, Araki J, Neubig G. How can we know what language models know? Transactions of the Association for Computational Linguistics , 2020, 8: 423–438. DOI: 10.1162/tacl_a_00324.

[12]
Shin T, Razeghi Y, Logan IV R L, Wallace E, Singh S. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020, pp.4222–4235. DOI: 10.18653/v1/2020.emnlp-main.346.
[13]
Li X L, Liang P. Prefix-tuning: Optimizing continuous prompts for generation. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021, pp.4582–4597. DOI: 10.18653/v1/2021.acl-long.353.
[14]
Haviv A, Berant J, Globerson A. BERTese: Learning to speak to BERT. In Proc. the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Apr. 2021, pp.3618–3623. DOI: 10.18653/v1/2021.eacl-main.316.
[15]
Liu X, Zheng Y N, Du Z X, Ding M, Qian Y J, Yang Z L, Tang J. GPT understands, too. AI Open, 2023. DOI: 10.1016/j.aiopen.2023.08.012.
[16]
Zhong Z X, Friedman D, Chen D Q. Factual probing is [MASK]: Learning vs. learning to recall. In Proc. the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2021, pp.5017–5033. DOI: 10.18653/v1/2021.naacl-main.398.
[17]
Gao T Y, Fisch A, Chen D Q. Making pre-trained language models better few-shot learners. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug. 2021, pp.3816–3830. DOI: 10.18653/v1/2021.acl-long.295.
[18]
Zhang N Y, Li L Q, Chen X, Deng S M, Bi Z, Tan C Q, Huang F, Chen H J. Differentiable prompt makes pre-trained language models better few-shot learners. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.
[19]

Han X, Zhao W L, Ding N, Liu Z Y, Sun M S. PTR: Prompt tuning with rules for text classification. AI Open , 2022, 3: 182–192. DOI: 10.1016/j.aiopen.2022.11.003.

[20]
Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning. In Proc. the 2021 Conference on Empirical Methods in Natural Language Processing, Nov. 2021, pp.3045–3059. DOI: 10.18653/v1/2021.emnlp-main.243.
[21]
Gu Y X, Han X, Liu Z Y, Huang M L. PPT: Pre-trained prompt tuning for few-shot learning. In Proc. the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022, pp.8410–8423. DOI: 10.18653/v1/2022.acl-long.576.
[22]
Deng M K, Wang J Y, Hsieh C P, Wang Y H, Guo H, Shu T M, Song M, Xing E, Hu Z T. RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proc. the 2022 Conference on Empirical Methods in Natural Language Processing, Dec. 2022, pp.3369–3391. DOI: 10.18653/v1/2022.emnlp-main.222.
[23]
Hou Y T, Dong H Y, Wang X H, Li B H, Che W X. MetaPrompting: Learning to learn better prompts. In Proc. the 29th International Conference on Computational Linguistics, Oct. 2022, pp.3251–3262.
[24]
Wang Z, Panda R, Karlinsky L, Feris R, Sun H, Kim Y. Multitask prompt tuning enables parameter-efficient transfer learning. In Proc. the 11th International Conference on Learning Representations, May 2023.
[25]
Kojima T, Gu S S, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. In Proc. the 36th International Conference on Neural Information Processing Systems, Nov. 28-Dec. 9, 2022, Article No. 1613.
[26]
Paranjape B, Lundberg S, Singh S, Hajishirzi H, Zettlemoyer L, Ribeiro M T. ART: Automatic multi-step reasoning and tool-use for large language models. arXiv: 2303.09014, 2023. https://arxiv.org/abs/2303.09014, Jul. 2024.
[27]
Narula U. Handbook of Communication: Models, Perspectives, Strategies. Atlantic Publishers & Distributors (P) Ltd, 2006.
[28]
Chandler D, Munday R. A Dictionary of Media and Communication. Oxford University Press, 2011.
[29]
Cobley P, Schulz P J. Theories and Models of Communication. De Gruyter Mouton, 2013.
[30]

Latané B. Dynamic social impact: The creation of culture by communication. Journal of Communication , 1996, 46(4): 13–25. DOI: 10.1111/j.1460-2466.1996.tb01501.x.

[31]

Orbe M P. From the standpoint(s) of traditionally muted groups: Explicating a co-cultural communication theoretical model. Communication Theory , 1998, 8(1): 1–26. DOI: 10.1111/j.1468-2885.1998.tb00209.x.

[32]

Segrin C, Abramson L Y. Negative reactions to depressive behaviors: A communication theories analysis. Journal of Abnormal Psychology , 1994, 103(4): 655–668. DOI: 10.1037/0021-843X.103.4.655.

[33]

Shannon C E. A mathematical theory of communication. The Bell System Technical Journal , 1948, 27(3): 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x.

[34]
Schramm W. The Process and Effects of Mass Communication. University of Illinois Press, 1954.
[35]
Cover T M, Thomas J A. Elements of Information Theory. John Wiley & Sons, 1991.
[36]
Sorensen T, Robinson J, Rytting C, Shaw A, Rogers K, Delorey A, Khalil M, Fulda N, Wingate D. An information-theoretic approach to prompt engineering without ground truth labels. In Proc. the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022, pp.819–862. DOI: 10.18653/v1/2022.acl-long.60.
[37]
Schick T, Schütze H. It’s not just size that matters: Small language models are also few-shot learners. In Proc. the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2021, pp.2339–2352. DOI: 10.18653/v1/2021.naacl-main.185.
[38]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y Q, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research , 2020, 21(1): 140.

[39]
Zhou Y L, Zhao Y R, Shumailov I, Mullins R, Gal Y. Revisiting automated prompting: Are we actually doing better? In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jul. 2023, pp.1822–1832. DOI: 10.18653/v1/2023.acl-short.155.
[40]
Logan IV R, Balažević I, Wallace E, Petroni F, Singh S, Riedel S. Cutting down on prompts and parameters: Simple few-shot learning with language models. In Proc. the 2022 Findings of the Association for Computational Linguistics, May 2022, pp.2824–2835. DOI: 10.18653/v1/2022.findings-acl.222.
[41]
Yuan W Z, Neubig G, Liu P F. BARTSCORE: Evaluating generated text as text generation. In Proc. the 35th International Conference on Neural Information Processing Systems, Dec. 2021, Article No. 2088.
[42]

Ben-David E, Oved N, Reichart R. PADA: Example-based prompt learning for on-the-fly adaptation to unseen domains. Transactions of the Association for Computational Linguistics , 2022, 10: 414–433. DOI: 10.1162/ tacl_a_00468.

[43]

Li B H, Hou Y T, Che W X. Data augmentation approaches in natural language processing: A survey. AI Open , 2022, 3: 71–90. DOI: 10.1016/j.aiopen.2022.03.001.

[44]
Zhou Y C, Muresanu A I, Han Z W, Paster K, Pitis S, Chan H, Ba J. Large language models are human-level prompt engineers. In Proc. the 11th International Conference on Learning Representations, May 2023.
[45]
Davison J, Feldman J, Rush A M. Commonsense knowledge mining from pretrained models. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp.1173–1178. DOI: 10.18653/v1/D19-1109.
[46]
Yang X J, Cheng W, Zhao X J, Yu W C, Petzold L, Chen H F. Dynamic prompting: A unified framework for prompt tuning. arXiv: 2303.02909, 2023. https://arxiv.org/abs/2303.02909, Jul. 2024.
[47]
Zaken E B, Goldberg Y, Ravfogel S. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proc. the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022. DOI: 10.18653/v1/2022.acl-short.1.
[48]
Zhang J O, Sax A, Zamir A, Guibas L, Malik J. Side-tuning: A baseline for network adaptation via additive side networks. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.698–714. DOI: 10.1007/978-3-030-58580-8_41.
[49]
Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, de Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. In Proc. the 36th International Conference on Machine Learning, Jun. 2019, pp.2790–2799.
[50]
Sung Y L, Cho J, Bansal M. LST: Ladder side-tuning for parameter and memory efficient transfer learning. In Proc. the 36th International Conference on Neural Information Processing Systems, Nov. 28-Dec. 9, 2022, Article No. 944.
[51]
Schick T, Schmid H, Schütze H. Automatically identifying words that can serve as labels for few-shot text classification. In Proc. the 28th International Conference on Computational Linguistics, Dec. 2020, pp.5569–5578. DOI: 10.18653/v1/2020.coling-main.488.
[52]
Hambardzumyan K, Khachatrian H, May J. WARP: Word-level adversarial reprogramming. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021, pp.4921–4933. DOI: 10.18653/v1/2021.acl-long.381.
[53]
Chen Y L, Liu Y, Dong L, Wang S H, Zhu C G, Zeng M, Zhang Y. AdaPrompt: Adaptive model training for prompt-based NLP. In Proc. the 2022 Findings of the Association for Computational Linguistics, Dec. 2022, pp.6057–6068. DOI: 10.18653/v1/2022.findings-emnlp.448.
[54]
Yin W P, Hay J, Roth D. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp.3914–3923. DOI: 10.18653/v1/D19-1404.
[55]
Cui L Y, Wu Y, Liu J, Yang S, Zhang Y. Template-based named entity recognition using BART. In Proc. the 2021 Findings of the Association for Computational Linguistics, Aug. 2021, pp.1835–1845. DOI: 10.18653/v1/2021.findings-acl.161.
[56]
Jiang Z B, Anastasopoulos A, Araki J, Ding H B, Neubig G. X-FACTR: Multilingual factual knowledge retrieval from pretrained language models. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020, pp.5943–5959. DOI: 10.18653/v1/2020.emnlp-main.479.
[57]
Nickel M, Kiela D. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp.3776–3785.
[58]
Hou Y T, Che W X, Lai Y K, Zhou Z H, Liu Y J, Liu H, Liu T. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.1381–1393. DOI: 10.18653/v1/2020.acl-main.128.
[59]
Min S, Zhong V, Zettlemoyer L, Hajishirzi H. Multi-hop reading comprehension through question decomposition and rescoring. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.6097–6109. DOI: 10.18653/v1/P19-1613.
[60]
Khot T, Khashabi D, Richardson K, Clark P, Sabharwal A. Text modular networks: Learning to decompose tasks in the language of existing models. In Proc. the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2021, pp.1264–1279. DOI: 10.18653/v1/2021.naacl-main.99.
[61]
Qin G H, Eisner J. Learning how to ask: Querying LMs with mixtures of soft prompts. In Proc. the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2021, pp.5203–5212. DOI: 10.18653/v1/2021.naacl-main.410.
[62]
Wang X Z, Wei J, Schuurmans D, Le Q V, Chi E H, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[63]
Lewkowycz A, Andreassen A, Dohan D, Dyer E, Michalewski H, Ramasesh V, Slone A, Anil C, Schlag I, Gutman-Solo T, Wu T H, Neyshabur B, Gur-Ari G, Misra V. Solving quantitative reasoning problems with language models. In Proc. the 36th International Conference on Neural Information Processing Systems, Nov. 28-Dec. 9, 2022, Article No. 278.
[64]
Wang X Z, Wei J, Schuurmans D, Le Q, Chi E, Zhou D. Rationale-augmented ensembles in language models. arXiv: 2207.00747, 2022. https://arxiv.org/abs/2207.00747, Jul. 2024.
[65]
Li Y F, Lin Z Q, Zhang S Z, Fu Q, Chen B, Lou J G, Chen W Z. On the advance of making language models better reasoners. arXiv: 2206.02336, 2022. https://arxiv.org/abs/2206.02336v1, Jul. 2024.
[66]
Fu Y, Peng H, Sabharwal A, Clark P, Khot T. Complexity-based prompting for multi-step reasoning. In Proc. the 11th International Conference on Learning Representations, May 2023.
[67]
Besta M, Blach N, Kubicek A, Gerstenberger R, Podstawski M, Gianinazzi L, Gajda J, Lehmann T, Niewiadomski H, Nyczyk P, Hoefler T. Graph of thoughts: Solving elaborate problems with large language models. In Proc. the 38th AAAI Conference on Artificial Intelligence, Feb. 2024, pp.17682–17690. DOI: 10.1609/aaai.v38i16.29720.
[68]
Schick T, Schütze H. Few-shot text generation with pattern-exploiting training. arXiv: 2012.11926, 2020. https://arxiv.org/abs/2012.11926, Jul. 2024.
[69]
Perez E, Lewis P, Yih W T, Cho K, Kiela D. Unsupervised question decomposition for question answering. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020, pp.8864–8880. DOI: 10.18653/v1/2020.emnlp-main.713.
[70]
Zhou D, Schärli N, Hou L, Wei J, Scales N, Wang X Z, Schuurmans D, Cui C, Bousquet O, Le Q V, Chi E H. Least-to-most prompting enables complex reasoning in large language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[71]
Dua D, Gupta S, Singh S, Gardner M. Successive prompting for decomposing complex questions. In Proc. the 2022 Conference on Empirical Methods in Natural Language Processing, Dec. 2022, pp.1251–1265. DOI: 10.18653/v1/2022.emnlp-main.81.
[72]
Creswell A, Shanahan M, Higgins I. Selection-inference: Exploiting large language models for interpretable logical reasoning. In Proc. the 11th International Conference on Learning Representations, May 2023.
[73]
Arora S, Narayan A, Chen M F, Orr L J, Guha N, Bhatia K, Chami I, Ré C. Ask me anything: A simple strategy for prompting language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[74]
Khot T, Trivedi H, Finlayson M, Fu Y, Richardson K, Clark P, Sabharwal A. Decomposed prompting: A modular approach for solving complex tasks. In Proc. the 11th International Conference on Learning Representations, May 2023.
[75]
Ye Y H, Hui B Y, Yang M, Li B H, Huang F, Li Y B. Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning. arXiv: 2301.13808, 2023. https://arxiv.org/abs/2301.13808, Jul. 2024.
[76]
Wu T S, Terry M, Cai C J. AI chains: Transparent and controllable human-AI interaction by chaining large language model prompts. In Proc. the 2022 CHI Conference on Human Factors in Computing Systems, Apr. 29-May 5, 2022, Article No. 385. DOI: 10.1145/3491102.3517582.
[77]
Wang L, Xu W Y, Lan Y H, Hu Z Q, Lan Y S, Lee R K W, Lim E P. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.2609–2634. DOI: 10.18653/v1/2023.acl-long.147.
[78]
Li J L, Wang J Y, Zhang Z S, Zhao H. Self-prompting large language models for zero-shot open-domain QA. In Proc. the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Jun. 2024, pp.296–310. DOI: 10.18653/v1/2024.naacl-long.17.
[79]
Ye X, Durrett G. Explanation selection using unlabeled data for chain-of-thought prompting. In Proc. the 2023 Conference on Empirical Methods in Natural Language Processing, Dec. 2023, pp.619–637. DOI: 10.18653/v1/2023.emnlp-main.41.
[80]
Shum K, Diao S Z, Zhang T. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Proc. the 2023 Findings of the Association for Computational Linguistics, Dec. 2023, pp.12113–12139. DOI: 10.18653/v1/2023.findings-emnlp.811.
[81]
Diao S Z, Wang P C, Lin Y, Pan R, Liu X, Zhang T. Active prompting with chain-of-thought for large language models. arXiv: 2302.12246, 2023. https://arxiv.org/abs/2302.12246, Jul. 2024.
[82]
Zhang Z S, Zhang A, Li M, Smola A. Automatic chain of thought prompting in large language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[83]
Yang K, Tian Y D, Peng N Y, Klein D. Re3: Generating longer stories with recursive reprompting and revision. In Proc. the 2022 Conference on Empirical Methods in Natural Language Processing, Dec. 2022, pp.4393–4479. DOI: 10.18653/v1/2022.emnlp-main.296.
[84]
Yang K, Klein D, Peng N Y, Tian Y D. Doc: Improving long story coherence with detailed outline control. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.3378–3465. DOI: 10.18653/v1/2023.acl-long.190.
[85]
Schick T, Dwivedi-Yu J, Dessí R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: Language models can teach themselves to use tools. In Proc. the 37th International Conference on Neural Information Processing Systems, Dec. 2023, Article No. 2997.
[86]
Shen Y L, Song K T, Tan X, Li D S, Lu W M, Zhuang Y T. HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face. In Proc. the 37th International Conference on Neural Information Processing Systems, Dec. 2023, Article No. 1657.
[87]
Wang B S, Deng X, Sun H. Iteratively prompt pre-trained language models for chain of thought. In Proc. the 2022 Conference on Empirical Methods in Natural Language Processing, Dec. 2022, pp.2714–2730. DOI: 10.18653/v1/2022.emnlp-main.174.
[88]
Nye M, Andreassen A J, Gur-Ari G, Michalewski H, Austin J, Bieber D, Dohan D, Lewkowycz A, Bosma M, Luan D, Sutton C, Odena A. Show your work: Scratchpads for intermediate computation with language models. In Proc. the 2022 Deep Learning for Code Workshop, May 2022.
[89]
Zelikman E, Wu Y H, Mu J, Goodman N D. STaR: Self-taught reasoner bootstrapping reasoning with reasoning. In Proc. the 36th International Conference on Neural Information Processing Systems, Nov. 28-Dec. 9, 2022, Article No. 1126.
[90]
Taylor R, Kardas M, Cucurull G, Scialom T, Hartshorn A, Saravia E, Poulton A, Kerkez V, Stojnic R. Galactica: A large language model for science. arXiv: 2211.09085, 2022. https://arxiv.org/abs/2211.09085, Jul. 2024.
[91]
Ting K M, Witten I H. Stacked generalization: When does it work? In Proc. the 15th International Joint Conference on Artificial Intelligence, Aug. 1997, pp.866–871.
[92]

Zhou Z H, Wu J X, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence , 2002, 137(1/2): 239–263. DOI: 10.1016/S0004-3702(02)00190-X.

[93]
Duh K, Sudoh K, Wu X C, Tsukada H, Nagata M. Generalized minimum Bayes risk system combination. In Proc. the 5th International Joint Conference on Natural Language Processing, Nov. 2011, pp.1356–1360.
[94]
Weng Y X, Zhu M J, Xia F, Li B, He S Z, Liu S P, Sun B, Liu K, Zhao J. Large language models are better reasoners with self-verification. In Proc. the 2023 Findings of the Association for Computational Linguistics, Dec. 2023, pp.2550–2575. DOI: 10.18653/v1/2023.findings-emnlp.167.
[95]
Yao S Y, Yu D, Zhao J, Shafran I, Griffiths T L, Cao Y, Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. In Proc. the 37th International Conference on Neural Information Processing Systems, Dec. 2023, Article No. 517.
[96]
Schick T, Schütze H. Few-shot text generation with natural language instructions. In Proc. the 2021 Conference on Empirical Methods in Natural Language Processing, Nov. 2021, pp.390–402. DOI: 10.18653/v1/2021.emnlp-main.32.
[97]
Yang J F, Jiang H M, Yin Q Y, Zhang D Q, Yin B, Yang D Y. SEQZERO: Few-shot compositional semantic parsing with sequential prompts and zero-shot models. In Proc. the 2022 Findings of the Association for Computational Linguistics, Jul. 2022, pp.49–60. DOI: 10.18653/v1/2022.findings-naacl.5.
[98]
Drozdov A, Schärli N, Akyürek E, Scales N, Song X Y, Chen X Y, Bousquet O, Zhou D. Compositional semantic parsing with large language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[99]
Press O, Zhang M R, Min S, Schmidt L, Smith N A, Lewis M. Measuring and narrowing the compositionality gap in language models. In Proc. the 2023 Findings of the Association for Computational Linguistics, Dec. 2023, pp.5687–5711. DOI: 10.18653/v1/2023.findings-emnlp.378.
[100]
Mialon G, Dessi R, Lomeli M, Nalmpantis C, Pasunuru R, Raileanu R, Rozière B, Schick T, Dwivedi-Yu J, Celikyilmaz A, Grave E, LeCun T, Scialom T. Augmented language models: A survey. arXiv: 2302.07842, 2023. https://arxiv.org/abs/2302.07842, Jul. 2024.
[101]
Yao S Y, Zhao J, Yu D, Du N, Shafran I, Narasimhan K R, Cao Y. ReAct: Synergizing reasoning and acting in language models. In Proc. the 11th International Conference on Learning Representations, May 2023.
[102]
Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, Cheng H T, Jin A, Bos T, Baker L, Du Y, Li Y, Lee H, Zheng H S, Ghafouri A, Menegali M, Huang Y P, Krikun M, Lepikhin D, Qin J, Chen D H, Xu Y Z, Chen Z F, Roberts A, Bosma M, Zhao V, Zhou Y Q, Chang C C, Krivokon I, Rusch W, Pickett M, Srinivasan P, Man L, Meier-Hellstern K, Morris M R, Doshi T, Santos R D, Duke T, Soraker J, Zevenbergen B, Prabhakaran V, Diaz M, Hutchinson B, Olson K, Molina A, Hoffman-John E, Lee J, Aroyo L, Rajakumar R, Butryna A, Lamm M, Kuzmina V, Fenton J, Cohen A, Bernstein R, Kurzweil R, Aguera-Arcas B, Cui C, Croak M, Chi E, Le Q. LaMDA: Language models for dialog applications. arXiv: 2201.08239, 2022. https://arxiv.org/abs/2201.08239, Jul. 2024.
[103]
Qiao S F, Ou Y X, Zhang N Y, Chen X, Yao Y Z, Deng S M, Tan C Q, Huang F, Chen H J. Reasoning with language model prompting: A survey. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.5368–5393. DOI: 10.18653/v1/2023.acl-long.294.
[104]
Lialin V, Deshpande V, Rumshisky A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv: 2303.15647, 2023. https://arxiv.org/abs/2303.15647, Jul. 2024.
[105]
Zhao W X, Zhou K, Li J Y, Tang T Y, Wang X L, Hou Y P, Min Y Q, Zhang B C, Zhang J J, Dong Z C, Du Y F, Yang C, Chen Y S, Chen Z P, Jiang J H, Ren R Y, Li Y F, Tang X Y, Liu Z K, Liu P Y, Nie J Y, Wen J R. A survey of large language models. arXiv: 2303.18223, 2023. https://arxiv.org/abs/2303.18223, Jul. 2024.
[106]
Dong Q X, Li L, Dai D M, Zheng C, Wu Z Y, Chang B B, Sun X, Xu J J, Li L, Sui Z F. A survey for in-context learning. arXiv: 2301.00234, 2022. https://arxiv.org/abs/2301.00234v1, Jul. 2024.
[107]
Lou R Z, Zhang K, Yin W P. Is prompt all you need? No. A comprehensive and broader view of instruction learning. arXiv: 2303.10475, 2023. https://arxiv.org/abs/2303.10475v1, Jul. 2024.
[108]
Zhong R Q, Lee K, Zhang Z, Klein D. Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. In Proc. the 2021 Findings of the Association for Computational Linguistics, Nov. 2021, pp.2856–2878. DOI: 10.18653/v1/2021.findings-emnlp.244.
[109]
Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In Proc. the 2021 CHI Conference on Human Factors in Computing Systems, May 2021, Article No. 314. DOI: 10.1145/3411763.3451760.
[110]

Gu Z H, Fan J, Tang N, Cao L, Jia B W, Madden S, Du X Y. Few-shot text-to-SQL translation using structure and content prompt learning. Proceedings of the ACM on Management of Data , 2023, 1(2): 147. DOI: 10.1145/3589292.

[111]
Abadi M, Chu A, Goodfellow I, McMahan H B, Mironov I, Talwar K, Zhang L. Deep learning with differential privacy. In Proc. the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2016, pp.308–318. DOI: 10.1145/2976749.2978318.
[112]
Gentry C. A fully homomorphic encryption scheme [Ph. D. Thesis]. Stanford University, Palo Alto, 2009.
[113]

Yang Q, Liu Y, Chen T J, Tong Y X. Federated machine learning: Concept and applications. ACM Trans. Intelligent Systems and Technology , 2019, 10(2): 12. DOI: 10.1145/3298981.

[114]
Kirchenbauer J, Geiping J, Wen Y X, Katz J, Miers I, Goldstein T. A watermark for large language models. In Proc. the 40th International Conference on Machine Learning, Jul. 2023, pp.17061–17084.
[115]
Wei J, Wang X Z, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E H, Le Q V, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. In Proc. the 36th International Conference on Neural Information Processing Systems, Nov. 28-Dec. 9, 2022, Article No. 1800.
[116]
Zhao Z H, Wallace E, Feng S, Klein D, Singh S. Calibrate before use: Improving few-shot performance of language models. In Proc. the 38th International Conference on Machine Learning, Jul. 2021, pp.12697–12706.
[117]

Schick T, Udupa S, Schütze H. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Transactions of the Association for Computational Linguistics , 2021, 9: 1408–1424. DOI: 10.1162/tacl_a_ 00434.

[118]
Liu Y, Gao Y, Su Z, Chen X K, Ash E, Lou J G. Uncovering and categorizing social biases in text-to-SQL. In Proc. the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023, pp.13573–13584. DOI: 10.18653/v1/2023.acl-long.759.
Journal of Computer Science and Technology
Pages 984-1004
Cite this article:
Song Y-F, He Y-Q, Zhao X-F, et al. A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models. Journal of Computer Science and Technology, 2024, 39(4): 984-1004. https://doi.org/10.1007/s11390-024-4058-8

74

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 21 December 2023
Accepted: 12 April 2024
Published: 20 September 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return