Browserker should crawl navigations in the order they were found

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem to solve

When Browserker crawls a website, it does it using a breadth-first search approach. This ensures that the crawler scans wide before it scans deep.

When the algorithm crawls a page and finds 10 new navigations, what order should they be processed? Currently, the order is deterministic and undefined (specifics below). This issue proposes that the order should be based on what is returned by the browser, which is presumed by the author to be the those found first in the DOM.

The current order could potentially cause users to wonder why some links on a page were scanned and some were not (particularly when the max actions constraint is applied).

Intended users

User experience goal

User should be able to predict which navigations will get crawled, as Browserker will process navigations based on their order in the document.

Proposal

Process unvisited navigations based on when they were discovered.

Further details

Navigations that have not been processed are found by their unvisited state. This query for navigations has no explicit ordering, so ordering is left up to the database. The database iterates "keys in lexicographically sorted order", which means that NavigationIDs are used for the ordering. As IDs are random (or hashed) byte values, the ordering is implicitly random.

What is the type of buyer?

Gold/Ultimate

Edited by 🤖 GitLab Bot 🤖