Googlebot

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine.

This name is actually used to refer to two different types of web crawlers: a desktop crawler (to simulate desktop users) and a mobile crawler (to simulate a mobile user).

Googlebot
Original author(s)Google
TypeWeb crawler
WebsiteGooglebot FAQ

Behavior

A website will probably be crawled by both Googlebot Desktop and Googlebot Mobile. However starting from September 2020, all sites were switched to mobile-first indexing, meaning Google is crawling the web using a smartphone Googlebot. The subtype of Googlebot can be identified by looking at the user agent string in the request. However, both crawler types obey the same product token (useent token) in robots.txt, and so a developer cannot selectively target either Googlebot mobile or Googlebot desktop using robots.txt.

Google provides various methods that enable website owners to manage the content displayed in Google's search results. If a webmaster chooses to restrict the information on their site available to a Googlebot, or another spider, they can do so with the appropriate directives in a robots.txt file, or by adding the meta tag to the web page. Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".

Currently, Googlebot follows HREF links and SRC links. There is increasing evidence Googlebot can execute JavaScript and parse content generated by Ajax calls as well. There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters. Currently, Googlebot uses a web rendering service (WRS) that is based on the Chromium rendering engine (version 74 as on 7 May 2019). Googlebot discovers pages by harvesting every link on every page that it can find. Unless prohibited by a nofollow-tag, it then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed, or manually submitted by the webmaster.

A problem that webmasters with low-bandwidth Web hosting plans[citation needed] have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Search Console" that allow website owners to throttle the crawl rate.

How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how typically a website is updated.[citation needed] Technically, Googlebot's development team (Crawling and Indexing team) uses several defined terms internally to take over what "crawl budget" stands for. Since May 2019, Googlebot uses the latest Chromium rendering engine, which supports ECMAScript 6 features. This will make the bot a bit more "evergreen" and ensure that it is not relying on an outdated rendering engine compared to browser capabilities.

Mediabot

Mediabot is the web crawler that Google uses for analyzing the content so Google AdSense can serve contextually relevant advertising to a web page. Mediabot identifies itself with the user agent string "Mediapartners-Google/2.1".

Unlike other crawlers, Mediabot does not follow links to discover new crawlable URLs, instead only visiting URLs that have included the AdSense code. Where that content resides behind a login, the crawler can be given a log in so that it is able to crawl protected content.

Inspection Tool Crawlers

InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in Google Search Console. Apart from the user agent and user agent token, it mimics Googlebot.

A guide to the crawlers was independently published. It details four (4) distinctive crawler agents based on Web server directory index data - one (1) non-chrome and three (3) chrome crawlers.

References

Tags:

Googlebot BehaviorGooglebot MediabotGooglebot Inspection Tool CrawlersGooglebotGoogleGoogle SearchWeb crawlerWorld Wide Web

🔥 Trending searches on Wiki English:

Megan Thee StallionLisa Marie PresleyBharatiya Janata PartyHiroyuki SanadaThe Fall Guy (2024 film)Aaron MotenFallout (American TV series)British Post Office scandalVasuki indicusOpenAIBrazilQueen of TearsTravis Head2021 NFL draftList of Stanley Cup championsDua LipaRwandaX (2022 film)Rule 34Devin HaneyBlack Sails (TV series)Glen PowellWalton GogginsOperation SandblastCivil War (film)2024 Indian general election in KarnatakaJude Bellingham2024 Indian general election in MaharashtraLorna SlaterJack NicholsonCicadaMartin SheenVance DrummondSam PitrodaHeeramandiKenneth C. GriffinMalcolm XDhruv RatheeRichard GaddBlackpinkCarlo AncelottiJennifer PanRihannaDwight D. EisenhowerApple Network ServerCaitlin ClarkXXXTentacionMarlon BrandoAeroflot Flight 593Inna Lillahi wa inna ilayhi raji'unBaby Face NelsonArtificial intelligenceSylvester StalloneAustin MurphyEiza GonzálezInstagramC (programming language)Deadpool & WolverineJohnny McDaidAmy WinehouseAlec BaldwinCosmo JarvisNava MauJustin HaywardKim Soo-hyunShirley MacLaineFrank SinatraKYURRebel WilsonElection Commission of IndiaNaughty AmericaGhoul (Fallout)Fallout (video game)PSV EindhovenMGM-140 ATACMS🡆 More