@@ -109,10 +109,64 @@ class MediaInfo:
109109
110110class PreviewUrlResource (DirectServeJsonResource ):
111111 """
112- Generating URL previews is a complicated task which many potential pitfalls.
113-
114- See docs/development/url_previews.md for discussion of the design and
115- algorithm followed in this module.
112+ The `GET /_matrix/media/r0/preview_url` endpoint provides a generic preview API
113+ for URLs which outputs Open Graph (https://ogp.me/) responses (with some Matrix
114+ specific additions).
115+
116+ This does have trade-offs compared to other designs:
117+
118+ * Pros:
119+ * Simple and flexible; can be used by any clients at any point
120+ * Cons:
121+ * If each homeserver provides one of these independently, all the homeservers in a
122+ room may needlessly DoS the target URI
123+ * The URL metadata must be stored somewhere, rather than just using Matrix
124+ itself to store the media.
125+ * Matrix cannot be used to distribute the metadata between homeservers.
126+
127+ When Synapse is asked to preview a URL it does the following:
128+
129+ 1. Checks against a URL blacklist (defined as `url_preview_url_blacklist` in the
130+ config).
131+ 2. Checks the URL against an in-memory cache and returns the result if it exists. (This
132+ is also used to de-duplicate processing of multiple in-flight requests at once.)
133+ 3. Kicks off a background process to generate a preview:
134+ 1. Checks URL and timestamp against the database cache and returns the result if it
135+ has not expired and was successful (a 2xx return code).
136+ 2. Checks if the URL matches an oEmbed (https://oembed.com/) pattern. If it
137+ does, update the URL to download.
138+ 3. Downloads the URL and stores it into a file via the media storage provider
139+ and saves the local media metadata.
140+ 4. If the media is an image:
141+ 1. Generates thumbnails.
142+ 2. Generates an Open Graph response based on image properties.
143+ 5. If the media is HTML:
144+ 1. Decodes the HTML via the stored file.
145+ 2. Generates an Open Graph response from the HTML.
146+ 3. If a JSON oEmbed URL was found in the HTML via autodiscovery:
147+ 1. Downloads the URL and stores it into a file via the media storage provider
148+ and saves the local media metadata.
149+ 2. Convert the oEmbed response to an Open Graph response.
150+ 3. Override any Open Graph data from the HTML with data from oEmbed.
151+ 4. If an image exists in the Open Graph response:
152+ 1. Downloads the URL and stores it into a file via the media storage
153+ provider and saves the local media metadata.
154+ 2. Generates thumbnails.
155+ 3. Updates the Open Graph response based on image properties.
156+ 6. If the media is JSON and an oEmbed URL was found:
157+ 1. Convert the oEmbed response to an Open Graph response.
158+ 2. If a thumbnail or image is in the oEmbed response:
159+ 1. Downloads the URL and stores it into a file via the media storage
160+ provider and saves the local media metadata.
161+ 2. Generates thumbnails.
162+ 3. Updates the Open Graph response based on image properties.
163+ 7. Stores the result in the database cache.
164+ 4. Returns the result.
165+
166+ The in-memory cache expires after 1 hour.
167+
168+ Expired entries in the database cache (and their associated media files) are
169+ deleted every 10 seconds. The default expiration time is 1 hour from download.
116170 """
117171
118172 isLeaf = True
0 commit comments