Settings sync bug when synchronizing between different web browsers

Description

As a Web IDE user, I want to use my installed extensions in more than one machine and/or Web Browser.

Acceptance Criteria

  • If I install or uninstall an extension in Web Browser A, the changes are reflected in Web Browser B. Browser B doesn't undo the changes from Browser A.
  • If I change the Web IDE user settings in Web Browser A, the changes are reflected in Web Browser B. Browser B doesn't undo the changes from Browser A.

Technical requirements

Implement a service worker that acts as a proxy for the Settings Sync API. The Settings Sync API can only store one resource snapshot per resource type. The Settings Sync API proxy extends the backend API by allowing to store more than one snapshot per resource type.

There will be "local snapshots" stored in an IndexedDB database and "remote snapshots" stored in the server-side database. If a web browser A attempts to access snapshot that is not available in the backend database anymore, it will look for the same resource in the "local snapshots" database.

flowchart TD
    A[Firefox] -->|GET /resource/extensions/123| B[Settings Sync API proxy]
    B --> C{Is resource cached locally?}
    C -->|No| D[Send request to backend]
    D --> G[Return backend resource]
    C -->|Yes| E[Return cached resource]
    E --> F[Delete resource from cache]

Issue investigation

Expand this section to read more about the technical investigation

When syncing settings between different web browsers (chrome and firefox or firefox and safari), the behavior of Settings Sync can be incorrect. For example, when installing an extension in the Web IDE while using the Chrome Web Browser, if the user opens the Web IDE in Firefox, the extension installed in Chrome will also be installed (correct behavior). However, the action of uninstalling the extension won't be applied between web browsers (incorrect behavior).

The following mermaid diagram illustrates how the Settings Sync Client interacts with the API and the problem. From a technical standpoint this error happens on the following conditions:

  1. Client A saves a snapshot of the latest settings sending a POST request to the backend.
  2. The POST request responds with a UUID.
  3. Client A assigns the UUID to the snapshot of the latest settings in local storage.
  4. User closes client A and performs step 1 to 3 in client B.
  5. User opens client A.
  6. Client A attempts to fetch the latest data it saved sending a GET request with the UUID received on step 2.
  7. Client A receives data that is not equal to the data it saved initially.
  8. As a result, client A assumes that it isn't synced with the server and overwrites the latest settings sent by client B.
flowchart TD
    A[VS Code machine] -->|get last data I saved in the cloud with UUID 123| B{Does it exist and\nit's the same as my local data?}
    B --> |yes| C[get latest data saved in the cloud]
    C --> D[apply latest data]
    B --> |no| E[I'm out of sync]
    E --> F[Save my data in the cloud]
    F --> G[Latest sync is overwritten :sad-face:]

How to reproduce

The user follows these steps:

  1. Opens the Web IDE in the Chrome web browser and installs an extension.
  2. Closes Chrome.
  3. Opens the Web IDE in Firefox.
  4. The extension installed in Chrome will be installed in Firefox (correct behavior).
  5. Uninstalls the extension in Firefox and closes this web browser.
  6. Opens the Web IDE again in Chrome.
  7. The uninstalled extension is not uninstalled in Chrome, instead, it is registered as "Installed" again (wrong behavior).

The correct behavior is that the extension is also uninstalled in Chrome.

Settings

The user follows these steps:

  1. Opens the Web IDE in the Chrome web browser and changes the color theme.
  2. Closes Chrome.
  3. Opens the Web IDE in Firefox.
  4. The color theme applied in Chrome is also applied in Firefox (correct behavior).
  5. Changes the color theme in Firefox and closes this web browser.
  6. Opens the Web IDE again in Chrome.
  7. The previously applied theme is still applied in Chrome and it is stored again in the backend (wrong behavior).

Demos

settings_sync_before_720p.mov

Root cause

When implementing the Settings Sync REST API, we made the design decision of storing a single resource snapshot per user per resource type in the backend to avoid having a significant impact on the database speed and size. For example, If client A saves a resource of type extensions by sending a POST request to the API, and later, client B does the same, client A's resource will be overwritten in the backend.

This bug happens because VSCode's Settings Sync client expects that the backend API will store, at least, one resource snapshot per user per resource type per machine. If the Settings Sync client in Chrome obtains an empty response for a resource that it previously stored in the backend, it will assume that the resource was never sent to the server and it will try synchronizing again without respecting the snapshots saved by Firefox.

Potential solutions

I explored two potential solutions for this problem: Relaxing database restrictions and implementing a service worker cache.

Relaxing database restrictions

We can remove the constraint one resource snapshot per user per resource type constraint from the database. In this way, different web browsers won't override each other's resource snapshots.

Advantages

  • This solution is compatible with all types of deployment strategies in the GitLab application. To understand why this characteristic is important, read the "disadvantages" section of the next potential solution.
  • It's a solution based on well documented practices in the GitLab application.

Disadvantages

  • It requires a database migration that removes an index and applies a new one. Based on my experience implementing the Settings Sync REST API, adding and removing indexes in an application like GitLab is a notoriously complex operation https://docs.gitlab.com/ee/development/database/adding_database_indexes.html.
  • We have to define business logic to prevent unconstrained growth of the vs_code_settings database table.

Settings sync cache service worker

There is a proof of concept and a demo for this solution Draft: feat: Settings sync cache proxy (!294 - closed). This solution uses a service worker that caches in an IndexedDB database the resources synced by a client. When that client attempts to obtain that resource using the resource id, the service worker will serve that resource instead of the backend.

Advantages

  • It doesn't require implementing and running a database migration which is an expensive operation at GitLab's scale.
  • It keeps the Settings Sync implementation simple and it doesn't increase the data storage requirements because data is stored in the client.
  • It's a simple solution that doesn't require adding more dependencies to the Web IDE. It completely relies on native web / well supported web browser technologies.

Disadvantages

The main disadvantage of this solution is also a blocker. I'll quote my explanation in !294 (comment 1831918309). We can't deploy a service worker in the GitLab application without

I've been investigating how to deploy a service worker in the GitLab application, and the outcomes of the investigation have changed my mind about the feasibility of this approach. This service worker has a strict requirement regarding how the application should serve it.

  1. Serve the service worker javascript asset in the root path / so the service worker can intercept HTTP requests to the REST API.
  2. Serve the service worker along with an Allow-Service-Worker HTTP header to tell the browser to relax the Service Worker's scope security restrictions.

From my investigation, I found that serving a service worker from the root path is not possible without significant compromises in maintainability. It would also affect the performance of the application because the service worker file won't benefit from asset caching feature of CDNs.

We can't add the Allow-Service-Worker header to CDN assets either. We could add the header if assets were served from the Rails HTTP server, but that's not always the case.


update After further investigation, we've identified a maintainable approach to load service workers in the GitLab application. See #320 (comment 1850715744) for further details.

This issue is blocked by providing basic support for service workers in the GitLab application PWA Service Workers + Caching (gitlab#23218).

Edited by Enrique Alcántara