Critical analysis of "Web crawlers Algorithms
Critical analysis of algorithms Spiders "
Minou Parhizkar 0527553
Abstract - A web crawler is a program or automated script that runs the World Wide Web in a methodical, automated manner. The objective of the paper is to make a critical analysis of the algorithms used by Web robots. It intends to examine and evaluate different approaches and different methods used by different Web search engines to catalog information.
Index Terms -
Web Crawler, Web search engines, SEO
I. aec INTRODUCTION
The software that finds information and sites that provide returns that information is considered a search engine or web crawler. Everyone uses the Internet spam-indirectly, at least! Each time you search the Internet using a service like Alta Vista, Excite, Lycos, or you use an index that is based on the release of a web crawler. Web crawlers, also known as spiders, robots, or wanderers, are software programs that automatically search the Web. Search engines use robots to find that on the Web, then they construct an index of pages that have been found.
Search Motors use spiders to index websites. When you submit your web pages to a search engine by completing their required submission page, the search engine spiders index your site. "A spider is an automated program that is managed by the system search engine. Spider visits a web site, read the content on the site, the site of the Meta Tags and follow the links that the site connects. The spider then returns all that information to a central depository, where data is indexed. He will visit each link you have on your website and index those sites as well. Some spiders only index a certain number of pages on your site.
A spider is almost like a book where it contains the table of contents, the actual content and links and references for all the sites it finds during its search, and can index up to a million pages per day.
Example: Google Spider
When you ask a search engine to find information, it is actually looking through the index he created and not actually search the Web. different search engines establish different classifications because not every search engine uses the same algorithm to search in the indexes.
One of the things that a search engine algorithm looking for the frequency and location of keywords on a Web page, but it can also detect artificial keyword stuffing or spamdexing. Then the algorithms analyze the way pages link to other pages in the Web. By checking how pages link to each other, an engine can both determine what a page is about, if the keywords of the linked pages are similar to keywords on the page. Most search engines are the top ranked search engines based on caterpillars while some may be based on human directories created. The people behind the search engines want the same thing every webmaster wants - traffic to their site. Since their content is mainly links to other sites, the thing for them is to make their search engine set up sites most relevant to the search query, and display the best of these early results. To do this, they use a complex set of rules called algorithms. When a search query is submitted to a search engine sites.
Posted on July 9, 2010.