Explore chapters and articles related to this topic
Real-Time Search in the Sensor Internet
Published in Ioanis Nikolaidis, Krzysztof Iniewski, Building Sensor Networks, 2017
Multimedia search engines use crawling techniques similar to those used by general search engines, but instead of concentrating on text content, they focus more on the embedded multimedia files. However, the selection policy of image crawlers controls which extracted images to keep for the following indexing process. Images that are too small or completely transparent are often discarded because they are usually used for design purposes of websites. Topical search engines use focused crawlers with a special selection policy that restricts the analysis to websites relevant for a specific topic. Some follow only URLs where the link description tag contains topic-matching keywords or where the website itself is about the topic, while others download all websites, analyze them, and discard irrelevant websites.
An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm
Published in International Journal of Computers and Applications, 2022
P. R. Joe Dhanith, B. Surendiran
The web crawler is a software bot that navigates the web-based on BFS (Breadth_First_Search) algorithm until all pages were obtained or no empty storage space was available by following a directed graph. Due to the dynamic growth of the internet, it is not possible to retrieve all the web pages. To overcome this trouble focused web crawler [1] was proposed for seeking, acquiring, indexing, and maintaining websites on a particular set of subjects representing a comparatively small section of the Web. Compared to the general purpose crawlers on the internet, focused crawlers decrease huge time and space resources and better satisfy user needs. Full page text and the link structure of the web page are the two main components of the focused crawlers.