Commit 4172652
feat!: Implement
> Tandem, or in tandem, is an arrangement in which two or more animals,
machines, or people are lined up one behind another, all facing in the
same
direction.[[1]](https://en.wikipedia.org/wiki/Tandem#cite_note-OED-1)
Tandem can also be used more generally to refer to any group of persons
or objects working together, not necessarily in
line.[[1]](https://en.wikipedia.org/wiki/Tandem#cite_note-OED-1)
(https://en.wikipedia.org/wiki/Tandem)
- Inspired by
https://github.com/apify/crawlee/blob/4c95847d5cedd6514620ccab31d5b242ba76de80/packages/basic-crawler/src/internals/basic-crawler.ts#L1154-L1177
and related code in the same class
- In my opinion, it implements the feature more cleanly and without
polluting `BasicCrawler` (...any further)
- The motivation for the feature is twofold:
1. Apify Actor development - it is common that an Actor receives a
`requestListSources` input from the user, which may be pretty complex
(regexp-based extraction from remote URL lists), and which is usually
parsed using `apify.RequestList.open`. At the same time, the Actor wants
to use the built in `RequestQueue`.
2. Sitemap parsing (#248) - similar to 1, but not coupled to the Apify
platform - we want to read URLs from a sitemap in the background, but
the URLs should go through the standard request queue
## Breaking changes
- `RequestList` does not support `.drop()`, `.reclaim_request()`,
`.add_request()` and `add_requests_batched()` anymore
- `RequestManagerTandem` with a `RequestQueue` should be used for this
use case, `await list.to_tandem()` can be used as a shortcut
- The `RequestProvider` interface has been renamed to `RequestManager`
and moved to the `crawlee.request_loaders` package
- `RequestList` has been moved to the `crawlee.request_loaders` package
- The `BasicCrawler.get_request_provider` method has been renamed to
`BasicCrawler.get_request_manager` and it does not accept the `id` and
`name` arguments anymore
- The `request_provider` parameter of `BasicCrawler.__init__` has been
renamed to `request_manager`
## TODO
- [x] new tests
- [x] fix existing tests
---------
Co-authored-by: Vlada Dusek <[email protected]>RequestManagerTandem, remove add_request from RequestList, accept any iterable in RequestList constructor (#777)1 parent 3dc1c7d commit 4172652
File tree
24 files changed
+773
-496
lines changed- docs
- guides
- code/request_storage
- introduction/code
- upgrading
- src/crawlee
- basic_crawler
- request_loaders
- storages
- tests/unit
- basic_crawler
- beautifulsoup_crawler
- http_crawler
- parsel_crawler
- playwright_crawler
- storages
24 files changed
+773
-496
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | 14 | | |
23 | 15 | | |
24 | 16 | | |
25 | 17 | | |
26 | 18 | | |
27 | 19 | | |
28 | 20 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | 21 | | |
33 | 22 | | |
34 | 23 | | |
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
Lines changed: 27 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | | - | |
| 28 | + | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
| |||
95 | 97 | | |
96 | 98 | | |
97 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
98 | 122 | | |
99 | 123 | | |
100 | 124 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
29 | 40 | | |
30 | 41 | | |
31 | 42 | | |
| |||
0 commit comments