Garbage collection with no sorting can improve performance at the cost of memory usage
# Final Release Note
After executing a [VIEW "STP_GCOL_NOSORT":1](https://docs.yottadb.com/ProgrammersGuide/commands.html#keywords-in-view-command), subsequent garbage collections avoid sorting. VIEW "STP_GCOL_NOSORT":0 switches to garbage collections with sorting. This requires that VIEW "STP_GCOL" use the full name of the option; abbreviations such as VIEW "STP" no longer work.
Avoiding sorting may improve garbage collection times, likely at the cost of increased memory usage. [$VIEW("SPSIZESORT")](https://docs.yottadb.com/ProgrammersGuide/functions.html#argument-keywords-of-view) performs a garbage collection, and returns a comma-separated list of 2 integers which indicate the memory usage (in bytes) of the stringpool with the unsorted and the sorted approaches respectively. If an application finds the unsorted value to be within its memory limits, it will likely benefit from the reduced runtime by switching to the unsorted approach (see https://gitlab.com/YottaDB/DB/YDB/-/issues/1145#note_2507097811 for examples). \$VIEW("SPSIZE") must now write out the full name of the option; abbreviations such as \$VIEW("SPS") no longer work.
The environment variable [ydb_stp_gcol_nosort](https://docs.yottadb.com/AdminOpsGuide/basicops.html#environment-variables) can be set to 0 or 1 (or any positive integer value) to initially choose the sorted or unsorted approach respectively by applications, defaulting to the sorted approach if the variable is not defined. Subsequent to startup, VIEW "STP_GCOL_NOSORT" commands can change the approach. [$VIEW("STP_GCOL_NOSORT")](https://docs.yottadb.com/ProgrammersGuide/functions.html#argument-keywords-of-view) returns a value of 0 if garbage collections use the sorted approach and 1 otherwise. Previously, garbage collections always used the sorted approach. [#1145]
# Description
Using [pin](https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html), a dynamic binary instrumentation tool, I found that sorting (`stpg_sort()`) takes about 50% of the time taken by garbage collection (`stp_gcol()`) for most variety of user workloads. So then started thinking if sorting can be eliminated altogether thereby improving garbage collection runtimes.
The reason we need sorting is to detect whether `mstr`s (the internal data structure corresponding to a user-visible M string) have overlap.
For example, `set x=$j(1,100),y=x` creates 2 mstrs, 1 for x and 1 for y where both point to the SAME 100 bytes in the stringpool (due to the `y=x`) resulting in only 100 bytes used up in the stringpool. On the other hand, `set x=$j(1,100),y=$j(1,100)` creates 2 mstrs, 1 for x and 1 for y where both mstrs point to DIFFERENT 100 byte regions in the stringpool resulting in a 200 byte usage in the stringpool.
In the first example, sorting the mstrs will help us identify, based on the addresses they point to, that they overlap and so we can preserve the overlap after the garbage collection too thereby keeping the memory requirements to just 100 bytes after the garbage collection. If sorting is eliminated, the memory need after the garbage collection would be 200 bytes.
Thus, avoiding sorting would result in lower runtimes for garbage collection (approximately 50% of the time) but might require more memory (how much more depends on how much the application uses overlapped strings).
And because of this extra memory requirement, it might not be a viable option for applications and therefore this unsorted garbage collection approach cannot be an unconditional change. It has to be an application choice. Towards that, we also need to provide an M command and/or env vars to control whether garbage collection should use sorting or not. And using sorting should be the default as that is how garbage collection has always been done so far.
# Draft Release Note
The M command `VIEW "STP_GCOL_NOSORT":1` signals future garbage collections to happen without sorting and `VIEW "STP_GCOL_NOSORT":0` reverts back to garbage collections with sorting. `VIEW "STP_GCOL"` must now write out the full name of the option; abbreviations such as `VIEW "STP"` will no longer work.
Avoiding sorting will likely improve garbage collection runtimes (~ 50%) but could result in increased memory needs depending on the application. `$VIEW("SPSIZESORT")` returns a comma-separated list of 2 integers which indicate the memory usage (in bytes) of the stringpool if one used the unsorted approach and the sorted approach respectively. If the application finds the unsorted value to be within its memory limits, it will likely benefit from the reduced runtime of switching to the unsorted approach (see https://gitlab.com/YottaDB/DB/YDB/-/issues/1145#note_2507097811 for examples). `$VIEW("SPSIZE")` must now write out the full name of the option; abbreviations such as `$VIEW("SPS")` will no longer work.
The env var `ydb_stp_gcol_nosort` can also be set to 0 or 1 (or any positive integer value) to choose the sorted or unsorted approach respectively by non-M (and M) applications. The env var initializes the approach at process startup and can be overridden by later `VIEW "STP_GCOL_NOSORT"` commands. The `$VIEW("STP_GCOL_NOSORT")` intrinsic function returns a value of 0 if garbage collections use the sorted approach and 1 otherwise. Sorted garbage collections are the default if neither the env var nor VIEW commands are specified. Previously, garbage collections always happened with the sorted approach and there was no unsorted approach choice.
issue