Skip to content

Test that project homepage is accessible by crawlers

Thong Kuah requested to merge regression_test_project_homepage_robots_txt into master

What does this MR do?

Test that project homepage is accessible by crawlers. We want the project homepage to be fully accesible so that GitLab.com public projects are indexed properly.

We simulate this by parsing robots.txt and blocking access to the Disallow lines in robots.txt

Basically regression test for !39520 (merged)

Screenshots

With:

$ git diff
diff --git a/public/robots.txt b/public/robots.txt
index f3fe51a25b0..12ceba88395 100644
--- a/public/robots.txt
+++ b/public/robots.txt
@@ -15,6 +15,7 @@
 User-Agent: *
 Disallow: /autocomplete/users
 Disallow: /search
+Disallow: /api
 Disallow: /admin
 Disallow: /profile
 Disallow: /dashboard

spec/features/projects/show/user_sees_readme_spec.rb fails with:

projects_show_user_sees_readme_obeying_robots_txt_shows_the_project_readme

ps. I tried the naive approach with WebMock but it does not have access to the served application, so we have use the middleware approach instead. Here's what I tried:

diff --git a/spec/features/projects/show/user_sees_readme_spec.rb b/spec/features/projects/show/user_sees_readme_spec.rb
index 250f707948e..b8a04fefdb8 100644
--- a/spec/features/projects/show/user_sees_readme_spec.rb
+++ b/spec/features/projects/show/user_sees_readme_spec.rb
@@ -7,6 +7,17 @@
   let_it_be(:project) { create(:project, :repository, :public) }
 
   it 'shows the project README', :js do
+    allowed_uris = lambda { |uri|
+      hostcheck = webmock_allowed_hosts.include?(uri.host) ||
+        WebMock::Util::URI.is_uri_localhost?(uri)
+
+      pathcheck = uri.path =~ /\/api\//
+      puts uri.to_s if !pathcheck
+
+      hostcheck
+    }
+    WebMock.disable_net_connect!(allow: allowed_uris)
+
     visit project_path(project)
     wait_for_request

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Thong Kuah

Merge request reports