Explore chapters and articles related to this topic
Data Science Skills and Graduate Certificates: A Quantitative Text Analysis
Published in Journal of Computer Information Systems, 2022
Haoqiang Jiang, Catherine Chen
This section describes the data-collection steps and data-process methods. The data were processed and analyzed in Python 3.7.6 using Jupyter development environment. spaCy is an open-source software library for natural language processing (NLP) in Python designed for large-scale information extraction tasks. In this study, the case-insensitive matching in PhraseMatcher, one of the classes in spaCy, was used to match phrases. To avoid selecting overlapping keywords, spaCy was used to parse phrases into tokens, and then the positioning index of the tokens within the document was used to avoid selecting overlapped keywords.
AdaBLEU: A Modified BLEU Score for Morphologically Rich Languages
Published in IETE Journal of Research, 2021
Shweta Chauhan, Philemon Daniel, Archita Mishra, Abhay Kumar
For the calculation of the AdaBLEU metric, we require extracting the POS tags and DP tags of the sentences. For this we have used the SpaCy library [26]. SpaCy is an open-source software library for natural language processing in Python.