Parsing AUC Result-Figures in Machine Learning Specific Scholarly Documents for Semantically-enriched Summarization
Published in Applied Artificial Intelligence, 2022
Iqra Safder, Hafsa Batool, Raheem Sarwar, Farooq Zaman, Naif Radi Aljohani, Raheel Nawaz, Mohamed Gaber, Saeed-Ul Hassan
Our dataset is a subset of data corpus used by FiguresSeer (Siegel et al. 2016) that contained over 22,000 full-text documents belonging from top computer science conferences; CVPR, ICML, ACL, CHI, and AAAI, indexed by Semantic Scholar. From among these, we randomly selected 1000 documents and obtained around 12,146 figures belonging to different classes (graph plots, bar charts, flow charts, etc.). We also extracted 5,804 subfigures from these documents. Later, we used a random sample of 1000 line graphs for parsing that yields 1272 axes, 2183 legend entries, and plots.