Skip to content

Implement sync client #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Implement sync client #70

wants to merge 4 commits into from

Conversation

simolus3
Copy link
Contributor

@simolus3 simolus3 commented Apr 23, 2025

This implements the sync client logic in the core extension. Client SDKs will still be responsible for opening sockets to the sync service, but the core extension is now responsible for driving that logic. The central interface here is the powersync_control SQL function, which is invoked by clients for a request to start a sync stream or lines received by the sync service. In response, the core extension responds with a list of instruction for the SDK (like e.g. updating the sync status).

We expect that implementing the stream client here will:

  1. Improve performance around handling BSON lines: Only our JavaScript SDK supports BSON at the moment. At the moment, that SDK has to deserialize the BSON line itself and re-encode it as JSON before sending it to the core extension. By implementing the deserialization in the core extension, we avoid that inefficiency.
  2. Make the protocol easier to evolve: Subsequent changes would mostly have to implemented here instead of at each client SDK.
  3. Make it easier to adopt the RSocket protocol across our SDKs.
  4. Not done here: Eventually support the BLOB column type, which requires a BSON-based transport that can now be handled here for all our SDKs.

The most important are structured like this:

  • core/src/bson contains a no-std implementation of the BSON format with serde support. The bson crate unfortunately only supports std environments and I couldn't find a decent alternative. We don't support all BSON types, but that should be fine since we only need the types supported by JSON (as well as byte arrays).
  • core/src/sync contains logic to implement the sync client:
    • line.rs defines the sync lines as structs supporting deserialization.
    • storage_adapter.rs is used as an interface for some common database operations. It's a bit simpler than the BucketStorage interface in the SDK since we can mostly just call other core functions directly without going through SQL.
    • sync_status.rs implements the SyncStatus interface from our other SDKs and logic to update it along with sending update notifications to the client.
    • interface.rs defines the powersync_control SQL function driving the sync client.
    • streaming_sync.rs contains the actual state machine for the sync client. I've implemented it as an async coroutine that is polled every time powersync_control is called, which allows an implementation that looks similar to the ones across our other SDKs.

@simolus3 simolus3 force-pushed the sync-state-machine branch from 756f514 to 2ba3209 Compare April 29, 2025 13:00
@simolus3 simolus3 changed the title WIP: Implement sync client, allow BLOB types WIP: Implement sync client Apr 29, 2025
@simolus3 simolus3 changed the title WIP: Implement sync client Implement sync client May 8, 2025
@simolus3 simolus3 marked this pull request as ready for review May 8, 2025 15:07
@simolus3 simolus3 requested a review from rkistner May 8, 2025 15:08
Copy link
Contributor

@rkistner rkistner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading through the process flow and associated state management now, and just want to check that I understand it correctly:

  1. A SqlController with SyncClient is created per connection when the extension is registered. This keeps a ClientState, which is Idle/empty, until "start" even is received.
  2. When the "start" even is received, it initializes a SyncIterationHandle, stored n the ClientState, and containing a StreamingSyncIteration. A mutex on the ClientState protects against instantiating multiple copies of this at the same time. (Is this correct? Not sure if I understand the purpose of the "Mutex" here correctly). Once a "stop" event is received, this is disposed.
  3. The StreamingSyncIteration contains a StorageAdapter, which includes some prepared statements. These are all disposed automatically when the StreamingSyncIteration is disposed via the "stop" event.

It seems like this gives us a nice way to persist prepared statements, instead of re-creating them every time. I've never measured the performance overhead, but I imagine there are places where persisting the prepared statements could reduce the performance overhead quite a bit. One disadvantage being that we'd still need to cater for the older APIs, which don't have a nice place to persist those statements. Either way, that is not important for the initial iteration here - we can do those optimizations later.

@simolus3
Copy link
Contributor Author

A SqlController with SyncClient is created per connection when the extension is registered. This keeps a ClientState, which is Idle/empty, until "start" even is received.

Exactly.

A mutex on the ClientState protects against instantiating multiple copies of this at the same time. (Is this correct? Not sure if I understand the purpose of the "Mutex" here correctly).

TLDR: There's only a single SyncClient per connection, but it's mutated when powersync_control is called. I thought we needed some kind of protection against powersync_control being called on multiple threads at the same time, but those can't affect the same connection unless SQLite is misused. So we can remove the mutex.

It's a bit of a trick to make SyncClient easier to use. We want to mutate the internal state of the client in powersync_control, so we need a mutable reference to that.
I think it would be sound to cast the ctx.user_data() we use in the function to a mutable sync client directly, and then avoid the mutex here. After all, it's bound to a single SQLite connection (which must only be used across threads if the threading more is serialized, in which case a SQLite-managed mutex on the connection will prevent concurrent powersync_control calls).

I think I'll remove the mutex logic, it's fairly complex and doesn't provide any advantage for us here.

The StreamingSyncIteration contains a StorageAdapter, which includes some prepared statements. These are all disposed automatically when the StreamingSyncIteration is disposed via the "stop" event.

Yes.

It seems like this gives us a nice way to persist prepared statements, instead of re-creating them every time.

FWIW it's kind of important that the statements don't outlive the sync iteration itself. SQLite doesn't let you close connections with pending prepared statements, so we need to give SDKs an opportunity to release resources in a clean manner.

@simolus3 simolus3 force-pushed the sync-state-machine branch from eb13fb5 to 4fe042d Compare June 3, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants