What framework do we use for the A/B testing of features on .com?

What framework should we use for the A/B testing of features on .com?

The release team has built this: https://gitlab.com/gitlab-org/gitlab-ee/-/feature_flags

Questions:

The new feature flag feature is not being dogfooded yet. In the code base the Flipper framework is still being used for rolling out new features, but this works on an all-on/all-of basis. Is it ready yet?
How does the percentage rollout work for .com? There are no tabels in the database that associate users with features. Is it being done at the loadbalancer level? Is a coherent user experience ensured? Once a user is selected for either the A-set or the B-set, will he stay in that set for a specific feature for all future requests for as long as that test is run?
Do we need a table for identifying which user is in which set for data analysis?

One possible solution is to create a table for A/B testing purposes that joins the users table with separate boolean columns for every test in order to keep track of which users are enrolled for which tests. If a user is enrolled for a certain test for the first time, this could be stored in the table. This data could be used in the code and for data analysis.