Skip to content

Conversation

@deshansh
Copy link
Contributor

@deshansh deshansh commented Oct 7, 2024

Description

  • Split the export_data function into export_data_csv and export_data_json, and added additional configuration options using kwargs

Issues

Testing

  • Added test to check the export functionally as well for extra configurations

Checklist

  • CI passed

@deshansh deshansh changed the title Split and added extra configuration to export_data function feat: Split and added extra configuration to export_data function Oct 7, 2024
@vdusek vdusek requested review from janbuchar and vdusek October 8, 2024 11:39
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I have a few comments...

Regarding the removal of export_data, I'm not sure we should do that. Removing it would break parity with the JS Crawlee and introduce a breaking change (in which case, you'd need to use the feat! commit prefix).

Maybe we can keep the export_data and call the appropriate methods within it, like this:

async def export_data(
    self,
    path: str | Path,
    *,
    content_type: Literal['csv', 'json'] = 'csv',
    dataset_id: str | None = None,
    dataset_name: str | None = None,
    **kwargs: Any,
) -> None:
    if content_type == 'csv':
        self.export_data_csv(path, dataset_id=dataset_id, dataset_name=dataset_name, **kwargs)
    elif content_type == 'json':
        self.export_data_json(path, dataset_id=dataset_id, dataset_name=dataset_name, **kwargs)
    else:
        raise ValueError(f'Unsupported content type: {content_type}.')

renovate bot and others added 16 commits October 19, 2024 11:53
- closes apify#563

@tlinhart any chance you could test this and confirm it fixes the issue?
- This adds a `get_key_value_store(id, name)` context helper to
`BasicCrawlingContext`
- Also, push_data calls are held until the request handler terminates
successfully (same as in JS version)
    - This is necessary for adaptive crawling (apify#249), among other things
- rm unused pipeline
- use env var for node versions (same as we do for python)
@janbuchar janbuchar requested a review from vdusek October 31, 2024 15:24
@vdusek vdusek changed the title feat: Split and added extra configuration to export_data function feat: Split and add extra configuration to export_data function Oct 31, 2024
@vdusek vdusek changed the title feat: Split and add extra configuration to export_data function feat: Split and add extra configuration to export_data method Oct 31, 2024
@janbuchar janbuchar requested a review from vdusek October 31, 2024 16:22
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek merged commit 6751635 into apify:master Oct 31, 2024
19 checks passed
@souravjain540
Copy link
Collaborator

hi @deshansh can you please send me your email or ping me on apify discord?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement/document a way how to pass extra configuration to json.dump()

4 participants