Fix Antipattern in Database Access: Misuse of database.many(size=x)
The expected pattern to use here is something like:
session = self.database.session()
query = self.querySomethingThatReturnsManyItems(someParam=somParam)
self.database.execute(session, query)
class SomeStreamingResultsObject:
def __init__(self, database, session):
self.database = database
self.session = session
def many(self):
rows = self.database.many(session, size=20)
return list(map(lambda x: ObjectRecord(x), rows))
return SomeStreamingResultsObject(self.database, session)
Then in the caller expect that you may have to operate over several results.many() requests.
Instead the pattern used in most of the code is the following:
session = self.database.session()
query = self.querySomethingThatReturnsManyItems(someParam=somParam)
self.database.execute(session, query)
rows = self.database.many(session, size=20)
return list(map(lambda x: ObjectRecord(x), rows))
We need to audit and fix the following issues in all of our database query APIs:
-
Forced limitation of record count unavoidable by the caller. -
Callers' code assumes the returned records are all the records it might need to iterate over.
Without fixing this antipattern we run the risk of information never being discoverable if too many records make it into a table. For example, querying for "all emails" actually only queries for the first 5000. This limitation is significantly more generous than elsewhere in our system where limits may be 100 or 20 depending on what is queried.
The easy fix would be to make a generic helper that wraps the fetching via many() and simply return that object everywhere. This is a project best suited to a student due to how pervasive the issue is and how good of an opportunity it is to understand the database subsystem in Occam.