Discovering API Directives from API Specifications with Text Classification

Jing-Xuan Zhang; Chuan-Qi Tao; Zhi-Qiu Huang; Xin Chen

doi:10.1007/s11390-021-0235-1

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

Discovering API Directives from API Specifications with Text Classification

Jing-Xuan Zhang^{¹^,²^,³}, Chuan-Qi Tao^{¹^,²}, Zhi-Qiu Huang^{¹^,²}, Xin Chen^{³^,⁴}

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Key Laboratory of Safety-Critical Software (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China

Key Laboratory of Complex Systems Modeling and Simulation (Hangzhou Dianzi University), Ministry of Education Hangzhou 310018, China

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

Show Author Information

Abstract

Application programming interface (API) libraries are extensively used by developers. To correctly program with APIs and avoid bugs, developers shall pay attention to API directives, which illustrate the constraints of APIs. Unfortunately, API directives usually have diverse morphologies, making it time-consuming and error-prone for developers to discover all the relevant API directives. In this paper, we propose an approach leveraging text classification to discover API directives from API specifications. Specifically, given a set of training sentences in API specifications, our approach first characterizes each sentence by three groups of features. Then, to deal with the unequal distribution between API directives and non-directives, our approach employs an under-sampling strategy to split the imbalanced training set into several subsets and trains several classifiers. Given a new sentence in an API specification, our approach synthesizes the trained classifiers to predict whether it is an API directive. We have evaluated our approach over a publicly available annotated API directive corpus. The experimental results reveal that our approach achieves an F-measure value of up to 82.08%. In addition, our approach statistically outperforms the state-of-the-art approach by up to 29.67% in terms of F-measure.

Keywords

text classification Application programming interface (API) directive API specification imbalanced learning

Electronic Supplementary Material

Download File(s)

jcst-36-4-922-Highlights.pdf (335.1 KB)

References

[1]

Maalej W, Robillard M P. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, 2013, 39(9): 1264-1282. DOI: 10.1109/TSE.2013.12.

Crossref Google Scholar

[2]

Petrosyan G, Robillard M P, De Mori R. Discovering information explaining API types using text classification. In Proc. the 37th International Conference on Software Engineering, May 2015, pp.869-879. DOI: 10.1109/ICSE.2015.97.

Crossref

[3]

Jiang H, Zhang J X, Ren Z L, Zhang T. An unsupervised approach for discovering relevant tutorial fragments for APIs. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.38-48. DOI: 10.1109/ICSE.2017.12.

Crossref

[4]

Monperrus M, Eichberg M, Tekes E, Mezini M. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 2012, 17(6): 703-737. DOI: 10.1007/s10664-011-9186-4.

Crossref Google Scholar

[5]

Dekel U, Herbsleb J D. Improving API documentation us-ability with knowledge pushing. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.320-330. DOI: 10.1109/ICSE.2009.5070532.

Crossref

[6]

Dagenais B, Robillard M P. Recovering traceability links between an API and its learning resources. In Proc. the 34th IEEE/ACM International Conference on Software Engineering, June 2012, pp.47-57. DOI: 10.1109/ICSE.2012.6227207.

Crossref

[7]

Subramanian S, Inozemtseva L Holmes R. Live API documentation. In Proc. the 36th ACM/IEEE International Conference on Software Engineering, May 2014, pp.643-652. DOI: 10.1145/2568225.2568313.

Crossref

[8]

Saied M A, Sahraoui H, Dufour B. An observational study on API usage constraints and their documentation. In Proc. the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2015, pp.33-42. DOI: 10.1109/SANER.2015.7081813.

Crossref

[9]

Liu X Y, Wu J X, Zhou Z H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539-550. DOI: 10.1109/TSMCB.2008.2007853.

Crossref Google Scholar

[10]

Robillard M P, DeLine R. A field study of API learning obstacles. Empirical Software Engineering, 2011, 16(6): 703-732. DOI: 10.1007/s10664-010-9150-8.

Crossref Google Scholar

[11]

Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: A case study of bug reports. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.505-514. DOI: 10.1145/1806799.1806872.

Crossref

[12]

Jiang H, Zhang J X, Li X C, Ren Z L, Lo D. A more accurate model for _nding tutorial segments explaining APIs. In Proc. the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, March 2016, pp.157-167. DOI: 10.1109/SANER.2016.59.

Crossref

[13]

Chen D Q, Manning C D. A fast and accurate dependency parser using neural networks. In Proc. the Conference on Empirical Methods in Natural Language Processing, October 2014, pp.740-750. DOI: 10.3115/v1/D14-1082.

Crossref

[14]

Manning C D, Mihai S, John b, Jenny F, Steven J B, David M. The Stanford CoreNLP natural language processing toolkit. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, June 2014, pp.55-60. DOI: 10.3115/v1/P14-5010.

Crossref

[15]

Mirray G, Carenini G. Summarizing spoken and written conversations. In Proc. the 2008 Conference on Empirical Methods in Natural Language Processing, October 2008, pp.773-782. DOI: 10.3115/1613715.1613813.

Crossref

[16]

Panichella A, Dit B, Oliveto R, Penta M D, Poshynanyk D, Lucia A D. How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.522-531. DOI: 10.1109/ICSE.2013.6606598.

Crossref

[17]

Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C N. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proc. the 27th International Conference on Automated Software Engineering, September 2012, pp.70-79. DOI: 10.1145/2351676.2351687.

Crossref

[18]

Gorla A, Tavecchia I, Gross F, Zeller A. Checking app behavior against app descriptions. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.1025-1035. DOI: 10.1145/2568225.2568276.

Crossref

[19]

Bernardi M L, Sementa C, Zagarese Q, Distante D, Penta M D. What topics do Firefox and Chrome contributors discuss? In Proc. the 8th Working Conference on Mining Software Repositories, May 2011, pp.234-237. DOI: 10.1145/1985441.1985480.

Crossref

[20]

Xia X, Lo D, Shihab E, Wang X Y, Yang X H. ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology, 2015, 61: 93-106. DOI: 10.1016/j.infsof.2014.12.006.

Crossref Google Scholar

[21]

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-18. DOI: 10.1145/1656274.1656278.

Crossref Google Scholar

[22]

Fu W, Menzies T, Sheng X P. Tuning for software analytics: Is it really necessary? Information and Software Technology, 2016, 76: 135-146. DOI: 10.1016/j.infsof.2016.04.017.

Crossref Google Scholar

[23]

Zhang C, Yang J Y, Zhang Y, Fan J, Zhang X, Zhao J J, Ou P Z. Automatic parameter recommendation for practical API usage. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.826-836. DOI: 10.1109/ICSE.2012.6227136.

Crossref

[24]

Field A. Discovering Statistics Using SPSS (2nd edition). Sage, 2005.

Crossref

[25]

Head A, Sadowski C, Murphy-Hill E, Knight A. When not to comment: Questions and tradeoffs with API documentation for C++ projects. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.643-653. DOI: 10.1145/3180155.3180176.

Crossref

[26]

Zhang J X, Jiang H, Ren Z L, Zhang T, Huang Z Q. Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Transactions on Software Engineering. DOI: 10.1109/TSE.2019.2919304.

Crossref

[27]

Dekel U. Increasing awareness of delocalized information to facilitate API usage [Ph.D. Thesis]. Carnegie Mellon University, 2009.

[28]

Zhou Y, Gu R H, Chen T L, Huang Z Q, Panichella S, Gall H C. Analyzing APIs documentation and code to detect directive defects. In Proc. the 39th International Conference on Software Engineering, May 2017, pp.27-37. DOI: 10.1109/ICSE.2017.11.

Crossref

[29]

Zhong H, Su Z D. Detecting API documentation errors. In Proc. the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, October 2013, pp.803-816. DOI: 10.1145/2509136.2509523.

Crossref

[30]

Shi L, Zhong H, Xie T, Li M S. An empirical study on evolution of API documentation. In Proc. the 14th International Conference on Fundamental Approaches to Software Engineering, March 26-April 3, 2011, pp.416-431. DOI: 10.1007/978-3-642-19811-3_29.

Crossref

[31]

Tan L, Yuan D, Krishna G, Zhou Y Y. /^*iComment: Bugs or bad comments?^*/. In Proc. the 21st ACM SIGOPS Symposium on Operating Systems Principles, October 2007, pp.145-158. DOI: 10.1145/1294261.1294276.

Crossref

[32]

Blasi A, Goffi A, Kuznetsov K, Gorla A, Ernst M D, Pezzè M, Castellanos S D. Translating code comments to procedure specifications. In Proc. the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, July 2018, pp.242-253. DOI: 10.1145/3213846.3213872.

Crossref

[33]

Zhong H, Zhang L, Xie T, Mei H. Inferring specifications for resources from natural language API documentation. Automated Software Engineering, 2011, 18(3/4): 227-261. DOI: 10.1007/s10515-011-0082-3.

Crossref Google Scholar

[34]

Pandita R, Taneja K, Williams L, Tung T. ICON: Inferring temporal constraints from natural language API descriptions. In Proc. the 2016 IEEE International Conference on Software Maintenance and Evolution, October 2016, pp.378-388. DOI: 10.1109/ICSME.2016.59.

Crossref

[35]

Robillard M P, Chhetri Y B. Recommending reference API documentation. Empirical Software Engineering, 2015, 20(6): 1558-1586. DOI: 10.1007/s10664-014-9323-y.

Crossref Google Scholar

[36]

Dagenais B, Robillard M P. Using traceability links to recommend adaptive changes for documentation evolution. IEEE Transactions on Software Engineering, 2014, 40(11): 1126-1146. DOI: 10.1109/TSE.2014.2347969.

Crossref Google Scholar

[37]

Treude C, Robillard M P. Augmenting API documentation with insights from Stack Overflow. In Proc. the 38th IEEE/ACM International Conference on Software Engineering, May 2016, pp.392-403. DOI: 10.1145/2884781.2884800.

Crossref

[38]

Kim J, Lee S, Hwang S, Kim S. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 2013, 33(1): Article No. 1. DOI: 10.1145/2414782.2414783.

Crossref Google Scholar

[39]

Wu Y C, Mar L W, Jiau H C. CoDocent: Support API usage with code example and API documentation. In Proc. the 5th International Conference on Software Engineering Advances, August 2010, pp.135-140. DOI: 10.1109/IC-SEA.2010.28.

Crossref

Journal of Computer Science and Technology

Volume 36 Issue 4,
July 2021

Pages 922-943

DOI: 10.1007/s11390-021-0235-1

Cite this article:

Zhang J-X, Tao C-Q, Huang Z-Q, et al. Discovering API Directives from API Specifications with Text Classification. Journal of Computer Science and Technology, 2021, 36(4): 922-943. https://doi.org/10.1007/s11390-021-0235-1

370

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 18 December 2019

Accepted: 09 June 2021

Published: 05 July 2021