O que é Robots.txt?
Definição Rápida
O Robots.txt é um ficheiro de texto que informa os bots dos motores de pesquisa sobre quais páginas ou secções do website não devem ser rastreadas nem indexadas.
The robots.txt file is one of the first things search engine bots check when visiting your website. Located at yoursite.com/robots.txt, it contains directives that tell crawlers which parts of your site they can access and which they should avoid. It uses a simple syntax with User-agent (which bot), Disallow (which paths to skip), and Allow (exceptions to disallow rules).
Common uses include blocking search engines from crawling admin areas, staging environments, duplicate content (like print versions of pages), internal search result pages, and private user account areas. You can also use it to point search engines to your sitemap file.
It's important to understand that robots.txt is a polite request, not a security measure. Well-behaved bots like Googlebot respect it, but malicious bots may ignore it entirely. Sensitive content should be protected with authentication, not robots.txt.
Misconfigured robots.txt files are one of the most common technical SEO mistakes. A single misplaced directive can accidentally block your entire site from being indexed, or prevent search engines from accessing CSS and JavaScript files they need to properly render your pages.
Por Que é Importante
Robots.txt directly controls what search engines can and cannot see on your website. A well-configured file helps search engines focus their limited crawl budget on your most important pages. A misconfigured one can make your entire website invisible to Google.
For large websites, robots.txt is essential for crawl budget management — preventing bots from wasting time on low-value URLs means they spend more time indexing the pages that matter.
Exemplos Reais
A company's new developer accidentally added Disallow: / to robots.txt, blocking Google from their entire site and causing traffic to drop 90% before anyone noticed
An e-commerce site blocked their faceted navigation URLs via robots.txt, saving thousands of pages of crawl budget for their actual product pages
A multi-site WordPress installation used robots.txt to prevent staging site content from being indexed by search engines
A SaaS platform blocked /app/ and /account/ paths to prevent internal dashboard pages from appearing in search results
Termos Relacionados
Technical SEO
O SEO técnico é o processo de optimizar a infra-estrutura do seu website para que os motores de pesquisa consigam rastrear, indexar e renderizar as suas páginas de forma eficiente.
Crawl Budget
O crawl budget representa o número de páginas que o bot de um motor de pesquisa rastreará no seu website num determinado período de tempo.
Sitemap
Um sitemap é um ficheiro ou página web que lista todas as páginas de um website, ajudando os motores de pesquisa a descobrir e indexar o conteúdo de forma mais eficiente.
Indexing
A indexação é o processo pelo qual os motores de pesquisa analisam e armazenam informações sobre páginas web na sua base de dados, tornando-as disponíveis nos resultados de pesquisa.
Precisa de ajuda com robots.txt?
A nossa equipa pode ajudá-lo a colocar este conceito em prática. Obtenha uma consulta gratuita para falar sobre o seu projecto.