Browserker should crawl navigations in the order they were found
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
When Browserker crawls a website, it does it using a breadth-first search approach. This ensures that the crawler scans wide before it scans deep.
When the algorithm crawls a page and finds 10 new navigations, what order should they be processed? Currently, the order is deterministic and undefined (specifics below). This issue proposes that the order should be based on what is returned by the browser, which is presumed by the author to be the those found first in the DOM.
The current order could potentially cause users to wonder why some links on a page were scanned and some were not (particularly when the max actions constraint is applied).
Intended users
User experience goal
User should be able to predict which navigations will get crawled, as Browserker will process navigations based on their order in the document.
Proposal
Process unvisited navigations based on when they were discovered.
Further details
Navigations that have not been processed are found by their unvisited state. This query for navigations has no explicit ordering, so ordering is left up to the database. The database iterates "keys in lexicographically sorted order", which means that NavigationIDs are used for the ordering. As IDs are random (or hashed) byte values, the ordering is implicitly random.