Fix unexpected behaviour of generate_subcatalogs#241
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #241 +/- ##
===========================================
+ Coverage 93.73% 93.91% +0.18%
===========================================
Files 30 32 +2
Lines 3749 3963 +214
===========================================
+ Hits 3514 3722 +208
- Misses 235 241 +6
Continue to review full report at Codecov.
|
lossyrob
left a comment
There was a problem hiding this comment.
This is looking good - thank you! A couple of thoughts:
The way this works now, the generate_subcatalogs recurses down to the child catalogs and then crawls up it's parent tree to cache parent catalog IDs. This ends up iterating over the catalog multiple times - both down the tree with for child in self.get_children(): and during the while loop you introduce. Can you structure it in such a way that it reduced the iterations? I think passing forward the subcat_id_to_cat as an optional parameter might do the trick.
Also, could you provide one or more unit tests that would cover the case mentioned in #240? Might be worth trying to test any edge cases that may arise around catalog merging - e.g. what happens if two subcatalogs have the same ID? This is often the case for subcatalogs that are based on date - so a subcatalog of day "15" might exist for a subcatalog of month "1", but also for month "2". I'm not sure how this code will behave in this scenario.
3b66b84 to
5651517
Compare
|
Thanks a lot for having a look! I have followed your suggestion and removed the redundant iteration over the catalog by passing over the parent catalog IDs. I have changed the subsequent loop over items slightly: I use the list of parent IDs to check whether subcatalogs need to be added, but I crawl down the template levels using the subcatalogs' I will add unit tests with something like #240 and other cases! |
|
I have added unit tests for the following edge cases:
|
lossyrob
left a comment
There was a problem hiding this comment.
Nice work, well tested. Just a couple of questions on a bit of code and I think this is good to go!
| id_iter = reversed(parent_ids) | ||
| if all(['{}'.format(id) == next(id_iter, None) | ||
| for id in reversed(item_parts.values())]): | ||
| continue |
There was a problem hiding this comment.
The '{}'.format(id) here is redundant, and could just be id, yeah?
Is the reversing to try and reduce iterations as the likely case that the more root-leaning parent IDs would match? If so the iteration of the reverse between the id_iter and item_parts would cancel this out I think - and I believe this is equivalent to the forward-iterating case, so perhaps the reverse's can be dropped?
There was a problem hiding this comment.
The '{}'.format(id) here is redundant, and could just be id, yeah?
Actually the values in the dictionary returned by layout_template.get_template_values(item) are not necessarily strings (they have the actual property type), so they need to be converted in order to compare to the IDs in the catalog structure. I have used the same syntax as further below (line 551 in the original version of the code) for consistency.
Is the reversing to try and reduce iterations as the likely case that the more root-leaning parent IDs would match? If so the iteration of the reverse between the id_iter and item_parts would cancel this out I think - and I believe this is equivalent to the forward-iterating case, so perhaps the reverse's can be dropped?
Here I want to to check whether the template sub-catalog structure matches the actual one, but on the item-leaning side. Reversing allows me to align the current and the desired structures using the item position as a reference. Suppose the items are in the catalog /my-catalog/my-collection/2020/12/01 and the template is ${year}/${month}/${day} (i.e. parent_ids = ['my-catalog', 'my-collection', '2020', '12', '01'] and item_parts.values() = ['2020', '12', '01']): by reversing both the structure and the template I can verify that they match on the outermost side, thus allowing me to skip to the next element.
There was a problem hiding this comment.
Actually the values in the dictionary returned by layout_template.get_template_values(item) are not necessarily strings
Ah! Of course. My mistake.
Here I want to to check whether the template sub-catalog...
Ok, that makes sense now. The parent IDs can be longer than the item_parts, and you're only exhausting the item parts - that was the part I was missing. Would you mind adding some comments to the effect of the comment above, as it'll make things a bit easier to parse for future readers?
There was a problem hiding this comment.
Good point, sorry for the cryptic code :)
Fixes #240