Skip to content

Conversation

@sudo-suhas
Copy link
Contributor

@sudo-suhas sudo-suhas commented Nov 23, 2022

  • Add HTTP extractor, a generic extractor that uses a script capable of
    emitting any type of asset.
  • Fix config error field key to support nested fields.
  • Disallow import of os module from scripts.
  • Remove unnecessary init method on application extractor.
  • Add test helper to assert slice of protos.

Depends on #434, that needs to be merged first.
Closes #413

@sudo-suhas sudo-suhas force-pushed the http-extractor branch 4 times, most recently from d9f4497 to 06586f3 Compare November 28, 2022 11:18
@sudo-suhas sudo-suhas marked this pull request as ready for review November 28, 2022 11:18
@sudo-suhas sudo-suhas force-pushed the http-extractor branch 5 times, most recently from bfecd3f to 4be7c88 Compare December 8, 2022 06:11
- Add HTTP extractor, a generic extractor that uses a script capable of
  emitting any type of asset.
- Fix config error field key to support nested fields.
- Disallow import of os module from scripts.
- Remove unnecessary init method on application extractor.
- Add test helper to assert slice of protos.
@StewartJingga
Copy link
Contributor

overall LGTM, but I found a case where user might want to make another call for each item to get more metadata.

Let's say we want to fetch jobs metadata from https://example.com/jobs. That endpoint will give us list of jobs that we will convert into Meteor assets. Unfortunately, to get other metadata of those jobs such as lineage, we need to make another call to https://example.com/jobs/job-id for each job. This will not be possible right now in current state.

Is it possible to have that capability in the http extractor? And yes this is outside of this PR's scope, but asking it here in case there is actually a way to handle that.

@sudo-suhas

@sudo-suhas
Copy link
Contributor Author

The case you mentioned is definitely valid but we will need to think about it a bit more and implement it carefully. There could be security implications; at this point, it is not entirely clear what the scope of recursive HTTP calls and handling would be. There is no easy way to expose it currently. I would consider it outside the scope of this PR and prefer addressing it later.

@StewartJingga StewartJingga self-requested a review January 11, 2023 02:35
@sudo-suhas sudo-suhas merged commit 2a2143b into main Jan 11, 2023
@sudo-suhas sudo-suhas deleted the http-extractor branch January 11, 2023 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add HTTP Extractor

3 participants