Explore chapters and articles related to this topic
Robust Web Data Extraction Based on Weighted Path-layer Similarity
Published in Journal of Computer Information Systems, 2022
Peng Gao, Hao Han
Some techniques focus on generating robust extraction locator directly. The OXPath extended the XPath with more semantic actions (e.g., click, form filling) and markers.22 Benefits from more machine-readable the approach improved the robustness but it also leads to the loss of some compatibility and requires higher learning costs. Leotta et al.23,24 proposed algorithms that generate robust web testing-oriented XPath for automated web application testing. These works focus on the UI components extraction such as the input button in a form field. It ranks the robustness of element attributes based on some heuristic rules during the generation of the XPath. Our previous work calculated the tree edit distance between two HTML trees, so the calculation time for some webpages became relatively large.3