I ran the resulting HTML of the project page through Google's Rich Results Test, and it looks like there's an issue with the URL field in our breadcrumb items.
I'm guessing it wants this to be a full URL with the host and everything. WDYT @fjsanpedro?
Exactly @pslaughter! At first, I thought about using absolute but that would require changing all views that include urls in the breadcrumb and also create rubocop cops to force absolute urls. Nevertheless, after reading https://webuniverse.io/relative-urls-in-structured-data/, it seems Google converts the relative into absolute URLs.
I noticed in a SO question that it looks like this requirement of absolute URL's could even be a recent or inconsistent thing
Anyways, I think it's worth pushing this to prod and seeing what happens. If it turns out that we do need absolute URL's we can handle that in a follow-up. I'll create an issue for this now just in case.
0 of 1 checklist item completed
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
The main challenge about using absolute urls is that we have to change all those views that call add_to_breadcrumbs and replace the relative path for absolute ones. We have to change as well all those views that directly call @breadcrumb_link = and replace it with absolute urls.
We can change all those routes now but how we can prevent adding relative urls instead of absolute ones in the future. We can create some rubocop cop that checks all calls to add_to_breadcrumbs and ensure the second param does not have _path( (we only can check statically not dynamically).
Another way would be to manually add the base_url aka https://gitlab.com if we detect they're relative when we add them to the breadcrumb stack.
@fjsanpedro in all examples in Google's Docs and in Schema.org they are absolute -- and I know that when using relative the testing tools can throw out errors when just pasting in code, as it auto-fills the domain with its own, but in theory, it should work ok on production, but I'd probably err on the side of caution on go with absolute.
Thanks @taylordanrw for your answer. The problem is that building the breadcrumb is a quite messy workflow distributed in many different views. In each of these views, developers can specify elements to insert in the breadcrumb and usually (always mainly) the relative path is used. What I want to express is that turning the URLs into absolute is not an easy task to achieve.
I've searched for websites using this schema type and I've seen both approaches.
Is there a way to ensure that it works properly without using the testing tools (I also get errors there if I query URLs directly)? If we can and we realize that it's better to use absolute urls we can tackle this issue, otherwise, we can leave it as it is.
@fjsanpedro thank you for the code snippet and example.
This is a grey area, as Google doesn't specify an acceptance criterion for absolute v relative. However, in all examples it uses absolute.
Looking through anecdotal articles on the internet around this, there is a general consensus that it should work - however, with it being relative URLs it is at the interpretation of the crawler.
Looking at https://gitlab.com/gitlab-org/gitlab in greater detail, this is failing the schema both on Google validation tools, and external validation tools. It is also failing the URL with Search Console when inspected, noting that the Schema implementation is invalid and ineligible for rich results - citing the relative URLs as Invalid URL in the field id.
As the URLs themselves on the page are absolute, and the section looks relatively templated - maybe as a input v output pay off we can look at Microdata versus JSON? This way we can use full absolute URLs and validate the schema/see any of the potential rich result benefits, without potential extensive re-engineering to get absolute into the JSON, with the pay off being not using Google's preferred format.
As the URLs themselves on the page are absolute, and the section looks relatively templated - maybe as a input v output pay off we can look at Microdata versus JSON? This way we can use full absolute URLs and validate the schema/see any of the potential rich result benefits, without potential extensive re-engineering to get absolute into the JSON, with the pay off being not using Google's preferred format.
@taylordanrw they look absolute but if you inspect the HTML you'd see that they're also relative. Therefore, I'm afraid using Microdata here doesn't solve it either. Both the HTML breadcrumb and the JSON+LD markup nurture from the same information, meaning that if we change how the HTML URLs are created (from relative to absolute), we will automatically have absolute URLs the markup schema.
@fjsanpedro Ah heck, I thought we had a low effort workaround there. For the time being, would it be possible to rollback the live implementation to not being live?
@taylordanrw I'm afraid not this time. Nevertheless, given that we're not totally sure about relative urls and that the fix could be live in a couple of days in prod, I guess we "should not be too worried about this"?
@taylordanrw the MR fixing that problem has been merged, so in a couple of days the change will be live.
Regarding this topic, I have received an alert from GCS warning me that the id (the url) field was invalid, so definitely, even when google crawls the site, the breadcrumb urls need to be absolute.
Regarding this topic, I have received an alert from GCS warning me that the id (the url) field was invalid, so definitely, even when google crawls the site, the breadcrumb urls need to be absolute.
@fjsanpedro WDYT of leaving this comment on the relevant blog post and Stackoverflow reference which seems to say otherwise?
@fjsanpedro this is something I flagged in a previous comment.
@pslaughter I would love for someone to update Stackoverflow. From an SEO's perspective, there's been many a time I've been sent a 2004 SF thread claiming X, when Google has no definitive literature on the matter!
@fjsanpedro generally speaking, as long as the JSON+LD code snippet inputted into the testing tools has absolute URLs, it will verify -- then on production just deploy relative URLS. The reason that relative URLs break in testing and verification tools is that as a code snippet, the tool tries to inject its own domain into them.
TLDR - if you can link me to an example of the JSON for breadcrumbs you want to deploy, I can test and verify it.