[IMPROVEMENT] Superior data design.
Debugging these last few weeks i slowly came to the realization that the way we had our finfo
and pinfo
collections organized was wrong.
A simple explanation of what was going on up until now would be:
We had a global finfo
array
We had a global pinfo
array
Each pinfo
had an internal array which would hold indexes pointing at the global finfo
array's elements
Upon a file encounter on a process pid
with the file's id = fd
:
- Search the global
finfo
array to see if the file was ever seen before - If not, add it
- have the corresponding
pinfo
's array element at indexfd
point to the index of the previously existing or newly defined file at the globalfinfo
array.
Later we found out that we need to skip files opened for writing during our 1.
search phase. That's easy to figure out if you simply imagine the following scenario:
- Open a file for writing with fd =
n
- Before closing
n
, we open the same file for reading
Associating the read
with the finfo
created by the write
is wrong, because we've discussed before that we don't care about a file's previous state before a write. We thus have to treat em as two separate entities.
We now realize that files in the global array that are opened for write and are not yet written are to be ignored on searches. We add a bool was_hash_printed
denoting that information in an attempt to patch the problem.
This worked! But later thinking about ways in which i could change the finfo
structure with a hashmap
or binary tree
for performance purposes i realized that we had an extra restraint to think of thanks to this poor design.
The new data structure had to provide reference validity
in order for the internal pinfo
data structure to have a means to point to that array.
Furthermore, We couldn't possibly just search the global finfo
array, since the to-be-written
list of files had no hash
associated with 'em. Did we not have any way to tell the files apart? We absolutely had, their fd
! Something which we didn't store inside the FILE_INFO
structure that the finfo
array was composed of.
Another fact worth noting is that in reality, we only cared about a process's list of files that are opened for writing
or reading & writing
. Definitely not the ones opened for plain reading
. On the same note as previously discussed, in our 1st
step we only want to search files opened for plain reading
. Do you see a pattern here?
As a result of the above, i propose this change. Have the pinfo
's internal data structure to be a map (in our case, a simple pair of arrays, one containing the fd
fields, and one the finfo
fields) mapping fd
s to FILE_INFO
s currently open for any sort of writing, and only add them to the global finfo
array once they are close(2)
d