-
Notifications
You must be signed in to change notification settings - Fork 1
2024 04 24
:::info Next meeting: 2024-05-06 18:00 CET (6pm) :::
:::info Current state: https://pad.funkwhale.audio/s/T-yx14DsH#Episodes-endpoint Last meeting: https://pad.funkwhale.audio/s/6mWuDexgz#Episode-GUIDs :::
test case ✖✔ | 🎉Static fetch-hash / generated GUID🎉 | Dynamic fetch-hash | GUID from the feed |
---|---|---|---|
rss feed without episode guids (old) IDs need to be generated once |
✔ | ✔ | ✔ |
rss feed with 2 duplicate guids (old) |
✔ | ✔ | ✖ |
guid changes for a given episode in the rss feed (old) Deduplication endpoint necessary, deduplicate client-side? |
✔ | ✖ | ✔ |
- listen to episode
- phone dies
- podcast publisher changes episode guid
- we get a new phone & connect to the database again; it gets everything from the server
--> server doesn't know of new guid, and phone doesn't know of old one Here we'd get a duplicate. When the client sends the 'create episode' call to the server, the server recognises the possible duplicate and
- creates the episode
- responds to the client with a question: do you want to deduplicate?
Discussion about deduplication from 2023-05-30
What's the expectation regarding the guids to align? We have discussed a deduplciation endpoint, this would be a good use-case for that.
How can client know how to deduplicate, if it doesn't receive any metadata from the server, only a guid? Impossible to compare with RSS feed if it has nothing to compare with. AP: title, if mostly similar then check media type, date & duration. If those are the same as well, then we identify as duplicate. But title usually the most useful one; media file URL (enclosure) will most likely change.
What data should the server store for deduplication?
- Title
- Release date
- Enclosure URL?
- Duration?
Other question, for future: which fields should be checked for similarity (when the client synchronises) - to avoid noise.
- podcaster publishes both DE & EN episodes. In 3 feeds: EN-only, DE-only, mixed. Episode guids created as different ones between
- What if you switch from DE-only to EN-only feed?
Deduplication MUST always only happen within a single feed, not across feeds. If the feed URL changes, it doesn't necessarily mean that the feed's GUID changes. AP:
- if podcast publisher knows what they're doing and has redirect, episode GUIDs remain the same
- if not:
- if episode guid in the feed is still the same, we keep as is
- if episode guids in feed are different, we would consider them as 'new' episodes and execute deduplication
If you switch, within same feed/subscription, from type a (e.g. video) to type b (e.g. audio).
If you merge episodes, this is per-user action. This means that there is duplication between users? Indeed. Then extra episode metadata It would affect storage space. Database structuring can be done smartly; e.g. have single table per user with all data, or split the data between multiple tables. As long as API specs are respected, server can do whatever. E.g. single-user server would not separate this all, if you're doing a multi-user database you'll want to be smarter. If in the latter case, deduplication might create a single new episode that's only used by one user. There's multiple ways to do it.
We can put a preamble recommendation to server implementers to do
- 2023-07-11 episode data fields - server can calculate GUID
- have to specify: has to be 'globally' unique within a feed
- probably server should when receiving instructions from client to create new episode or patch one, and it sends an updated GUID, it should check it is actually unique within the feed
- We should establish clearly separate naming; separate from feed GUID. Proposal: sync-UUID
Conclusion:(?) we want to keep a separate sync guid? Yes, because:
- when clients have different deduplication standards
- it can help multi-user servers to have single episode entries and use guid for sync. (Although if allow the client to generate sync-UUIDs, they would become unique across the server.)
Really needed to tell server about episode as soon as it was found? Or can it be only when there's a meaningful action (e.g. play). That would save many calls to the server, and sync-IDs that have to be stored in the client. Potential problem: if episode was played already on client A, and then client pu -> Solution: allow client to request server to provide 'changed since' data.
Should we support feeds who don't keep historical data? E.g. episodes that are no longer in feed but already 'Downloaded' status should also be considered an action that acts as trigger to inform the server of the episode.
Conclusion:
- don't push all episodes to sync server
- if client syncs episode timestamp, but doesn't know the episode sync-ID yet, then client needs a way to request additional information about the episode to locally match received sync ID and local episodes
- should have 'native' calls "I played episode but don't have a sync ID yet" and "I played episode with this sync ID"
- "lazy synchronization": only sync episode data after interaction with episode -> sync-ID fields stay empty until interaction happens