Explore chapters and articles related to this topic
Privacy towards GIS Based Intelligent Tourism Recommender System in Big Data Analytics
Published in Siddhartha Bhattacharyya, Václav Snášel, Indrajit Pan, Debashis De, Hybrid Computational Intelligence, 2019
Abhaya Kumar Sahoo, Chittaranjan Pradhan, Siddhartha Bhattacharyya
There are two phases, named as data collection and data mining, used in tourism research. During the collection phase, we collect online textual data from social media and extract useful hidden information from these data. The data mining phase integrates data processing and pattern discovery sub-phases. In the first phase, online data are collected through web crawling technology. The web crawler is a set of programs used to download web pages, extract URLs from HTML pages and fetch them [15]. In the second phase, the data mining technique is used to extract hidden patterns from collecting a huge amount of textual data through data preprocessing and pattern discovery. Data preprocessing sub-phase, includes data cleaning, tokenization and word streaming approaches applied to the tourism related data, whereas pattern discovery sub-phase, includes latent Dirichlet allocation (LDA), sentiment analysis, statistic analysis, clustering and different regression models used to discover useful patterns [9].
Application of technology
Published in Mike Tooley, Engineering GCSE, 2012
Search engines such as Excite, HotBot and Google use automated software called Web crawlers or spiders. These programs move from Web site to Web site, logging each site title, URL, and at least some of its text content. The object is to hit millions of Web sites and to stay as current with them as possible. The result is a long list of Web sites placed in a database that users search by typing in a keyword or phrase.
Applications of Technology
Published in Roger Timings, Basic Manufacturing, 2006
Search engines such as Excite, HotBot and Google use automated software called Web crawlers or spiders. These programs move from Web site to Web site, logging each site title, URL, and at least some of its text content. The object is to hit millions of Web sites and stay as current with them as possible. The result is a long list of Web sites placed in a database that users search by typing in a keyword or phrase.
Extraction of affective responses from customer reviews: an opinion mining and machine learning approach
Published in International Journal of Computer Integrated Manufacturing, 2020
Z. Li, Z. G. Tian, J. W. Wang, W. M. Wang
Since the affective responses are extracted from customers’ reviews, the first step is to collect customers’ reviews. This step is to collect online customer reviews and preprocess the review texts obtained. The collection process can be achieved through a web crawler. A web crawler is a program or a script for capturing information on the Internet according to certain rules. It can capture relevant web resources and data specifically and considerably. On account of a large number of reviews on the online store, the web pages of online shopping are selected as the sources of online reviews in this paper. The crawled information from the web pages of online shopping contains not only customer reviews but also other irrelevant contents like advertisements, HTML code and so on. Extraneous removal will remove irrelevant contents and keep customer reviews. In order to establish the relationship between review texts and affective responses, customer reviews need to be pre-processed because they are usually unstructured. The pre-processing section consists of two parts: linguistic processing and the vectorisation of review texts.
Discovering the relationship of disasters from big scholar and social media news datasets
Published in International Journal of Digital Earth, 2019
Liang Zheng, Fei Wang, Xiaocui Zheng, Binbin Liu
Many of the subclass disaster types seldom occur in the reality or are not often discussed in the scholar papers and news articles. Considering over 300 subclasses of all possible disaster types into the search work will bring complicated processing data, and more importantly not all of them are interesting for the consequent disaster chain research. Therefore, first and foremost, all of the disaster type subclasses are separately searched through Baidu Scholar and the number of search results for each keyword is tallied up. Web crawler is used to help to get the data. Web crawler can be regarded as an Internet bot that systematically browses the World Wide Web normally for the purpose of web indexing based on certain defined rules. It can be a program or script developed in different computer languages (Guo 2017), and Python programming language is adopted in our research. Some of the results are picked and listed in Table 2.