Skip to content

Conversation

@mrjo118
Copy link
Contributor

@mrjo118 mrjo118 commented Aug 26, 2025

Background

When syncing to the file system sync in combination with an external sync service such as Syncthing, new files with a timestamp older than the last sync timestamp may get silently ignored (instead of being pulled into Joplin) if the external sync service syncs at the same time as the Joplin sync.

This is due to the basic delta algorithm's reliance on file timestamps, which presents a problem described well in this post https://discourse.joplinapp.org/t/photos-sometimes-not-syncing-syncthing-android-mac/47083/4:

The main difference between filesystem sync and other types of sync is that with the latter, you've got a single source that is used by all devices, and the whole synchronisation mechanism is built around it. In the case of filesystem sync, there is no single source, but rather the filesystem sync path on each device acts as its own source. When there is just a single source, Joplin can assume that its state is always complete. However, with filesystem sync, there is no such guarantee, and this eventually leads to the issue at hand.

This PR resolves the issue by introducing a modified sync algorithm for the basic delta. This is now utilised by WebDAV sync where the url is a localhost url, and by file system sync where a new 'Detect changes based on any timestamp change' setting is enabled. This setting is enabled by default for new profiles or Joplin installations, but if the user is already using file sync at the point of upgrading, it will be set to false to avoid a forced re-scan of all items.

When use for the first time, it will need to do a re-scan of all local notes in order to store the file timestamps of all the sync target items. However, once this is done, items will not be re-scanned unless the timestamp of a file in the sync target directory has changed. When using the new algorithm, it is backwards compatible if the setting is turned back off, and is compatible with other clients which do not have the setting enabled or are on older version of Joplin without the new setting. The requirement for a full or partial re-scan on enabling of the setting, also applies when re-enabling the setting if switching back and forth between enabled and disabled and syncing changes between toggling the setting.

This fixes the longstanding issue #6517

Changes

  1. A new 'Detect changes based on any timestamp change' setting has been added to the advanced sync settings. This is only available when using file system sync
  2. The basic delta function (for both old and new behaviour) has been changed to only scan files at the root of the sync target directory which have a 32 character long filename and the md extension. Whilst not necessary for the old algorithm, it is unnecessary to process any other items, so the logic might as well be shared. For the new algorithm, this change will prevent items in the sync target directory which are not readable by Joplin (eg. Syncthing metadata files) from being fetched on every sync
  3. A new database column in the sqlite database has been added to store the modified time of files in the sync target directory. This is defaulted to 0 for all existing local items. When the enhanced basic delta algorithm is first used, the modified time of all relevant files in the sync target directory are stored against the local items via the sync_items table
  4. When using a localhost url in combination with WebDAV sync, or when the setting is enabled in combination with file system sync, the basic delta function uses a new algorithm. This new algorithm does not rely on timestamps increasing to determine whether new changes need to be synced, but will consider an earlier timestamp to be a new change as well, because the change in the previous point allows current file timestamps to be tracked. After detecting files with changes to timestamp in either direction, the existing logic in the rest of the sync algorithm will determine whether local changes actually need to be overwritten, or if there is a conflict

At a basic level it works like so:

  • For each eligible item, check whether a matching item exists locally. If it does, set its remote_item_updated_time in memory
  • If the remote_item_updated_time matches the modified time on the file, the item has already been synced, so will note be fetched
  • If the remote_item_updated_time does not match the modified time on the file, the item needs to be fetched. Then the rest of the sync algorithm will update the local item if the updated_time within the remote md is later than in the local one, or if it is earlier, create a conflict etc
  • If no matching item exists locally, the item needs to be fetched. Then the rest of the sync algorithm will create the local item
  • Local deletions are managed based on the lack of a matching item remote item which exists locally. No change has been made to this behaviour

Limitations

The only way a change can get ignored by the sync using this new algorithm is if 2 Joplin clients sync conflicting changes to the same file at exactly the same timestamp on the filesystem (this assumes that the sync service will sync the modified time of files across all clients, which is the case for Syncthing). However, the reliability of the new algorithm is constrained by the filesystem of the storage used for the sync target directory. At precision's of 1 ms - 10 ms this is very unlikely to happen, in comparison to the likelihood of changes getting missed when using the old algorithm. At a precision of seconds however (used on some older filesystems; for modification time, the precision is just 2 seconds for FAT32), this may be less reliable. On desktop, this could be easily solved by ensuring that the drive used for the sync target is formatted to a filesystem with a high timestamp precision, but for mobile, using onboard storage may be required to get the best reliability in some cases. However, newer versions of Android support exFAT for external storage, which can have up to 10 ms precision, but this could depend on the implementation of the file storage driver whether it is utilised.

On Windows syncing with a dir on an NTFS formatted drive, when checking the values in the remote_item_updated_time of the sqlite db I could see that the modified_date of the files have a 1 ms precision. On Android 16 on emulator syncing to on board storage, when checking the values in the remote_item_updated_time of the sqlite db I could see that the modified_date of the files have a 2 ms precision. The precision does seem to be OS dependent in addition to file system dependent however, as when I tested on an older Android 10 (real) device, the timestamp accuracy for file system sync to both internal and external (formatted to FAT32) storage was at least 1 second (I expect it was 2 seconds, at least in the case of the FAT32 storage).

I asked AI, and it looks like if you’re on Android 11 or above you can get millisecond precision for the device storage at least. So Android 10 is the last Android version to have that constraint at OS level (which is 1 second timestamp precision). If using WebDAV, the precision of the modified timestamp is also dependant on the WebDAV server being used. For example, when using RoundSync to serve a WebDAV server pointing to a Box.com account, the available precision was only 1 second, compared to 1 millisecond if using file system sync to internal storage on the same device. Further still, if using something like RoundSync or rclone to sync to a third party cloud service, the cloud service may also limit the precision of the modified timestamp.

In spite of the potential for the precision of the timestamp to be limited however, even with a 1 second precision, I suspect this is still likely to be significantly more reliable than using the current algorithm in combination with a mirrored sync target, because the risk of changes not being synced is only if 2 Joplin clients sync the same item at exactly the same second, rather than being dependent on whether a Joplin sync completes on one client before the external sync service has synced changes (not necessarily for the same items) originating from another client.

Testing

  • When the sync algorithm is first utilised, I have verified that it will update the remote_item_updated_time column for every sync item to match the corresponding modified date on the file in the sync target directory. This ensures that after the initial scan, items which have not changed will no longer be fetched on each run of the sync. This also means that if the initial state of the profile has missed changes already, these will be addressed on the first sync with the setting enabled
  • I have ensured that syncing using the new algorithm is compatible when syncing with older Joplin clients without the new setting, and after enabling the new setting, if I disable it again, no re-scan is required and the sync will continue to work as before
  • With the new setting enabled or disabled, I have ensured that all creates, updates and fetches are cleared after 1 or 2 manual syncs, when syncing changes in both directions between 2 clients, regardless of whether the file timestamp in the sync target directory is earlier, equal to or later than the timestamp being compared to
  • I have ensured that with the new setting enabled, when following the reproduction steps in the description of Sync silently ignores new files with old timestamps #6517, I am able to receive the incoming item, and I am then able to sync every future change to the item both ways between 2 clients. In addition to creating new md files based on the production steps, I can also modify an existing md file, increment the updated_time within the file and make a change to the note contents, and this results in the change being detected by the sync and updated locally
  • I have verified that the new setting is only available in the UI for the file system sync target
  • I have verified note creation, update and deletion between 2 clients both ways works correctly, with only one client having the new setting enabled. I have also verified conflicting note updates and deletions correctly generate conflicts in the UI, matching the same behaviour with or without the new setting disabled
  • I have ensured that if additional files are added to the root of the sync target directory, they are ignored by the sync unless the filename is 32 chars long with an md extension
  • The basic delta creates info log entries in the format:
    Synchronizer: BasicDelta: Report: {"timestamp":1756248482745,"older":27,"newer":0,"equal":1}
    I have verified with the new code, the log entry returns a count of only the valid md files in the root directory (split between older, newer and equal). This is because any additional files, including the mandatory info.json, are now excluded from the output of the basic delta. When the new setting is enabled, the count will be distributed differently across older, newer and equal, but the total count is equivalent, and the log entry includes '(enhanced)' text in it

Videos:

These demonstrate syncing between the code in this PR as a dev build, and the Joplin 3.4.5 production build

Part 1:

electron_BGYcgejvt3-00.00.00.000-00.03.08.681-seg1.mp4

Part 2:

electron_BGYcgejvt3-00.03.08.681-00.06.15.446-seg2.mp4

@laurent22 laurent22 added the v3.5 label Aug 26, 2025
@tomasz1986
Copy link

Note though, that bypassing this check has the caveat that any files in the main Joplin directory on the sync target that are not readable by Joplin, will be fetched on every sync unless they are manually removed. Because of this, I have made this change conditional upon enabling a new 'Ignore timestamp when fetching new items' setting, with advice in the description of when it is useful to enable, and it's caveat.

Is thew new setting really necessary? What do you mean exactly by files "not readable by Joplin"?

@mrjo118
Copy link
Contributor Author

mrjo118 commented Aug 26, 2025

What do you mean exactly by files "not readable by Joplin"?

Any files present which are not created by Joplin.
Sometimes users try to put files onto the sync target manually. Currently if they did this, after the first attempt to sync one of these files, it would get ignored by Joplin, as the context timestamp would then surpass the updated_time on the file

@tomasz1986
Copy link

Honestly, instead of adding the setting, wouldn't it be better to just throw an error and ask those users delete such items? I mean, as far as I am aware, there is no valid reason to fool around the sync target folder and add unrelated things in there manually. I would assume only a tiny fraction of Joplin users would do such thing anyway... or at least I hope so. 😅

@mrjo118
Copy link
Contributor Author

mrjo118 commented Aug 26, 2025

Didn't want to mess with the existing behaviour too much, as I wouldn't want this to generate more support tickets.

To me it made sense to have no change in behaviour by default, which has a safeguard for rookie users. Then advanced users can enable this new setting, in the advanced section of sync settings, so they can have better support for Syncthing and other external sync tools. But we'll see what Laurent thinks.

@tomasz1986
Copy link

Thinking about this again, Syncthing also creates its own .stfolder/.stversions directories, .stignore file, and other temporary files when downloading data. Are these going to be a problem as well?

@mrjo118
Copy link
Contributor Author

mrjo118 commented Aug 26, 2025

Thinking about this again, Syncthing also creates its own .stfolder/.stversions directories, .stignore file, and other temporary files when downloading data. Are these going to be a problem as well?

Additional directories in the root Joplin dir are ok. But additional files will be a problem.

Possible options to exclude these external files would be to only check for files with names that are 32 chars not including the extension, or only checking files with the .md extension

@tomasz1986
Copy link

only checking files with the .md extension

I think this would be ideal (because if you put your own MD files inside this particular folder, you really are asking for trouble 😅).

@mrjo118
Copy link
Contributor Author

mrjo118 commented Aug 26, 2025

only checking files with the .md extension

I think this would be ideal (because if you put your own MD files inside this particular folder, you really are asking for trouble 😅).

Looks like there is an existing function which checks both that the file is 32 chars and has an md extension. That is pretty specific, so with that filtering in place I think now it does make the new setting redundant, so I've removed it.

I have updated the PR description to reflect this

@personalizedrefrigerator personalizedrefrigerator added the sync sync related issue label Aug 27, 2025
@mrjo118 mrjo118 changed the title All: Prevent Joplin sync ignoring new objects when syncing with an external sync service concurrently All: Fixes #6517: Prevent Joplin sync ignoring changes when syncing with an external sync service concurrently Aug 28, 2025
@mrjo118
Copy link
Contributor Author

mrjo118 commented Aug 28, 2025

I have now found a way to also deal with fixing updates to server objects not getting synced, which involves a toggleable alteration to the basic delta algorithm which is also backwards compatible with the old one, and with existing Joplin clients. So if this gets merged, it will fix the issue in it's entirety 😃

@mrjo118 mrjo118 changed the title All: Fixes #6517: Prevent Joplin sync ignoring changes when syncing with an external sync service concurrently All: Fixes #6517: Add new setting to prevent Joplin sync ignoring changes when syncing with an external sync service concurrently Aug 28, 2025
@laurent22
Copy link
Owner

Thank you for working on this but the fact that it's a rather technical setting means very few people will actually use this, yet the change brings a lot of extra complexity in terms of maintenance. I'm not sure what to do here. I guess it's not possible to enable this by default?

@tomasz1986
Copy link

I guess it's not possible to enable this by default?

I also wanted to ask the same question. For clarity, I have been running Joplin with this code for about 11 days now, and everything seems to be working as expected.

@mrjo118 Is there any reason or circumstances where one would be advised not to enable this setting? Filesystem sync is broken without it, so it is basically required there. If that is indeed the case, is there any actual reason to even have an option to disable the new behaviour?

@mrjo118
Copy link
Contributor Author

mrjo118 commented Sep 13, 2025

@laurent22 @tomasz1986 Functionally I believe that the new algorithm is sound, regardless of whether using a mirrored sync target or a single source of truth. In the case of using a hosted webdav server, the new algorithm is as good as the existing one, as the limitation around the timestamp precision also affects the existing algorithm. My concern is that enabling the new algorithm by default for everyone using file system sync and webdav is going to cause disruption for users with large data sets, because a full rescan is required on initial enablement, in order to obtain the modified timestamps of each file on file system of the server. In particular, for users of file system sync on Android, this could be a very slow process due to severe performance limitations with Android SAF introduced in Android 11+.

I do think this would be ok to be enabled by default for new instances of Joplin or new profiles though. @laurent22 what do you think about if I added some logic to default the new setting to true, but explicitly set it to false if file system / webdav sync is already setup on the initial migration of the new db schema?

@tomasz1986
Copy link

tomasz1986 commented Sep 13, 2025

Just my opinion, but taking all that into consideration, I would still like to suggest to just enable it for everyone (and possibly even remove the setting) regardless of the slow scan after the upgrade (which, if I understand correctly, will be required only once).

This is because if the setting is hidden under "advanced configuration", the majority of users won't even know it exists, and the issue will still affect them (and in many cases, without them even realising that something is wrong, leading to conflicted copies or accusing Joplin and its synchronisation of being "unreliable" 😅).

@mrjo118
Copy link
Contributor Author

mrjo118 commented Sep 13, 2025

Yes, the rescan is only required once. For file system sync, I would be open to forcing the new sync algorithm if Laurent is ok with that. However for webdav sync there is probably quite a large userbase, the majority of which will receive no gain from this change. Maybe I could enforce the new algorithm if the target is webdav and the host is localhost, otherwise use the existing algorithm?

Then either make the new setting apply to file system sync only and default to enabled for new profiles only, or remove the setting entirely and enforce the new algorithm for everyone using file system sync, depending on what @laurent22 is comfortable with

@mrjo118 mrjo118 changed the title All: Fixes #6517: Add new setting to prevent Joplin sync ignoring changes when syncing with an external sync service concurrently All: Fixes #6517: Prevent Joplin sync ignoring changes when syncing with an external sync service concurrently for file system and WebDAV sync Sep 15, 2025
@mrjo118
Copy link
Contributor Author

mrjo118 commented Sep 16, 2025

@laurent22 I have now updated the PR as follows:

  1. I have simplified the description of the 'Detect changes based on any timestamp change' setting, changed it to only be an option for file system sync and changed it to be stored in the database, to allow setting a conditional default value at the point of migration. It is now defaulted to enabled for new profiles and Joplin installations, but if the user is currently using file system sync, when they upgrade Joplin, the setting will be set to false to avoid a full re-scan of items. I have tested that this works correctly by upgrading existing profiles with file system sync setup, WebDAV sync setup, and creating a new profile while running the new code
  2. The new 'enhanced' basic delta algorithm is now used for WebDAV automatically if the url is a localhost url, otherwise WebDAV sync will use the existing basic delta algorithm (as we don't want to force a full re-scan for a feature which will not benefit those users). I have manually tested that using both a localhost url and my ip address directly (therefore not a localhost url) for a local WebDAV server correctly does or does not use the enhanced algorithm, as can be determined from the logs

If you would prefer to remove the setting altogether and force a full re-scan of items on upgrade for file system sync users, please let me know

@HugoPlatzer
Copy link

I would really like to see this merged, because as of now, file-based sync (using Syncthing) often does not manage to sync photos in note attachments. I guess this happens mostly with large attachments and generally not with plain-text notes.

This IMHO is a pretty serious issue on an otherwise great note-taking app as sync cannot be trusted to synchronize everything - once a photo has been ignored, no clicking of the Sync button on either Android or desktop Joplin will help - only a full database rebuild causes it to recognize something is missing.

@mrjo118
Copy link
Contributor Author

mrjo118 commented Oct 3, 2025

@laurent22 I have now updated the PR as follows:

  1. I have simplified the description of the 'Detect changes based on any timestamp change' setting, changed it to only be an option for file system sync and changed it to be stored in the database, to allow setting a conditional default value at the point of migration. It is now defaulted to enabled for new profiles and Joplin installations, but if the user is currently using file system sync, when they upgrade Joplin, the setting will be set to false to avoid a full re-scan of items. I have tested that this works correctly by upgrading existing profiles with file system sync setup, WebDAV sync setup, and creating a new profile while running the new code

  2. The new 'enhanced' basic delta algorithm is now used for WebDAV automatically if the url is a localhost url, otherwise WebDAV sync will use the existing basic delta algorithm (as we don't want to force a full re-scan for a feature which will not benefit those users). I have manually tested that using both a localhost url and my ip address directly (therefore not a localhost url) for a local WebDAV server correctly does or does not use the enhanced algorithm, as can be determined from the logs

If you would prefer to remove the setting altogether and force a full re-scan of items on upgrade for file system sync users, please let me know

@laurent22 The PR is much bigger now, but it's mostly test units in file-api.test. I've added test coverage for both the existing and new basic delta algorithm to increase confidence (there was zero test coverage for the existing algorithm, so this is an improvement). The logic for the new algorithm has not changed since the first version I implemented, which @tomasz1986 has been using for over a month now via the built binaries which I sent him

@tomasz1986
Copy link

I can confirm that I have been testing this PR on 4 desktop PCs and 1 Android device for about a month now, and everything seems to be working as expected (without the need for any scripts that change file timestamps into the future, etc. as is the case without this commit).

@laurent22
Copy link
Owner

How much of a problem is it to do a full sync for existing profiles? I wonder if we could enable it for them too since it's a one-off and anyway if someone is using filesystem sync they should probably enable the feature, even if it means having to wait a bit the first time

@laurent22
Copy link
Owner

(by the way there are a few conflicts)

@mrjo118
Copy link
Contributor Author

mrjo118 commented Oct 9, 2025

How much of a problem is it to do a full sync for existing profiles?

The only problem is them having to wait the first time, nothing else. Based on that the change is now targeted to only affect the users who will reap benefit from it, I think it makes sense to remove the configuration setting, so it is enabled for all file system sync users.

I will change this and sort out the conflicts.

@mrjo118
Copy link
Contributor Author

mrjo118 commented Oct 9, 2025

Changes now done

@laurent22 laurent22 merged commit 1111bde into laurent22:dev Oct 16, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sync sync related issue v3.5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants