Refactor Bulk download invoice to use REST APIs
The following discussion from !399 should be addressed:
@oswaldo: Soooooo, I did some load testing with production credentials to see how long it takes to run for a month's data, and below are some inferences:
Initially, the job was querying for accounts for a given entity, and iterated through each of them to fetch invoices for them during the given time period.
Apparently, the query to return accounts for
USentity takes way too long (waited for almost 20mins just to query for the account ids - this could be because there are too many accounts in this entity). The query ran quick for other entities though, which has less number of accounts:
Thus, I have modified the querying logic to:
- Query invoices during the time period
- Check if the account of the invoice belongs to the given entity. If not, skip.
- If yes, download the pdf file for the invoice
It took 40 mins to download invoices for 32 days, total records processed: 2603.
The average number of invoices for a month is: 1663, maximum being 2438. Trend being:
We can setup expectation with the Finance team that the report would be available within an hour - the request is for a month duration as mentioned in the issue (https://gitlab.com/gitlab-org/customers-gitlab-com/issues/504).
Let me know your thoughts.
Edit: Would have preferred to do a join in this situation, but for the limitations, it won't be possible :(