Explore chapters and articles related to this topic
Text Analysis
Published in Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, Big Data and Social Science, 2020
Evgeny Klochikhin, Jordan Boyd-Graber
When the examples x are individual words and the labels y represent the grammatical function of a word (e.g., whether a word is a noun, verb, or adjective), the task is called part-of-speech tagging. This level of analysis can be useful for discovering simple patterns in text: distinguishing between when “hit” is used as a noun (a Hollywood hit) and when “hit” is used as a verb (the car hit the guard rail).
Artificial Intelligence for Document Image Analysis
Published in Sk Md Obaidullah, KC Santosh, Teresa Gonçalves, Nibaran Das, Kaushik Roy, Document Processing Using Machine Learning, 2019
Himadri Mukherjee, Payel Rakshit, Ankita Dhar, Sk Md Obaidullah, KC Santosh, Santanu Phadikar, Kaushik Roy
Part of speech tagging is the annotation of each word in the sentence with its part of speech. Often, the part of speech portrays a lot of useful information which can be used for analysis. Two different parts of speech for the same word are presented as follows:
A textual data-driven method to identify and prioritise user preferences based on regret/rejoicing perception for smart and connected products
Published in International Journal of Production Research, 2022
Yinfeng Du, Dun Liu, Hengxin Duan
Data preprocessing of textual reviews is required so as to provide high-quality experimental data for subsequent research. Generally, data preprocessing contains text deduplication, word segmentation, part-of-speech tagging, stop words removal, etc. Word segmentation is the process of recombining consecutive word sequences into phrase sequences according to a certain specification. For Chinese word segmentation, many algorithms have developed, including Maximum matching method, optimal matching method, bidirectional matching method. We employ Jieba text segmentation for word segmentation in this paper. Part-of-speech tagging aims to mark an appropriate part-of-speech (i.e. nouns, verbs, adjectives, etc.) for each word resulted from word segmentation. Stop words, such as punctuation, tone, etc. be words that have no contribution to the textual characteristics. Usually, stop words exist in large numbers and can be deleted. This paper utilises HIT stopwords to delete the useless words. By textual preprocessing according to the mentioned operations, valid comments are finally obtained.
Real-valued syntactic word vectors
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2020
The results showing the performance of the word vectors confirm the observation made by Schnabel, Labutov, Mimno, and Joachims (2015): ‘different tasks favour different embeddings’. The best results for the word-similarity benchmark are obtained from word2vec with Skip-gram architecture. The results obtained from RSV are as good as the other methods. SENNA results in the highest accuracy on the part-of-speech tagging task. In comparison with SENNA, RSV shows relatively weaker performance on the part-of-speech tagging but in general, the results obtained from RSV are comparable or higher than the results obtained from the other methods. We see a large variation in the results obtained from the named-entity recognition task. The best result on this task is obtained from GloVe which is distinctively higher than other methods such as RSV. This is different from the results obtained from part-of-speech tagging, where GloVe shows very weak performance on it but RSV is comparably good. The performance of RSV on the dependency parsing task is as good as other methods such as GloVe. We see that word2vecf (SGRAMF) does not necessarily result in higher performance on the dependency parsing task although it uses the dependency context.
A methodology to integrate maintenance management systems and BIM to improve building management
Published in Science and Technology for the Built Environment, 2022
Pedram Nojedehi, William O’Brien, H. Burak Gunay
The process of classifying words into their parts of speech (e.g. noun, verb, adjective) and labeling them accordingly is known as part-of-speech tagging, POS-tagging (Bird, Klein, and Loper 2009). POS-tagging algorithms identify the verbs in each WO and verbs normally represent the actions taken or occurring in the WO descriptions (Figure 5D). Hence, a part-of-speech tagging algorithm classifies the WOs based on the action in each WO.