Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
42335bf
Add CI GH Action test_with_pyarrow_latest
albertvillanova Jul 7, 2022
f252601
Trigger CI on push for testing
albertvillanova Jul 7, 2022
67dde3a
Install git OS dependency
albertvillanova Jul 7, 2022
cfc2765
Fix seqeval installation on Python 3.6
albertvillanova Jul 7, 2022
dedbd9e
Do not install already present Git
albertvillanova Jul 7, 2022
244ca80
Add CI GH Action test_with_pyarrow_6
albertvillanova Jul 7, 2022
0ba62e7
Skip test_changed_dataset_card for testing
albertvillanova Jul 7, 2022
a866d96
Add CI GH Action test_on_windows_with_pyarrow_latest
albertvillanova Jul 7, 2022
209c3e6
Skip test_changed_dataset_card for testing
albertvillanova Jul 7, 2022
d5fcce7
Set env variables in the GH Action
albertvillanova Jul 7, 2022
6f60f9b
Disable audio test
albertvillanova Jul 7, 2022
6af0e2c
Add CI GH Action test_on_windows_with_pyarrow_6
albertvillanova Jul 7, 2022
a6856ea
Use strategy for OS
albertvillanova Jul 7, 2022
2621e7f
Fix _resolve_single_pattern_locally for Windows
albertvillanova Jul 7, 2022
4a603fc
Fix
albertvillanova Jul 7, 2022
23beaef
Use strategy for PyArrow version
albertvillanova Jul 7, 2022
ccdf337
Revert fixing _resolve_single_pattern_locally for Windows
albertvillanova Jul 7, 2022
f337ffa
Revert trigger CI on push
albertvillanova Jul 7, 2022
193a737
Add CI GH Action check_code_quality
albertvillanova Jul 7, 2022
dd702ed
Make test dependent on check_code_quality
albertvillanova Jul 7, 2022
a2a4621
Use Python 3.7 on Windows
albertvillanova Jul 7, 2022
9615fd3
Fix _resolve_single_pattern_locally for Windows
albertvillanova Jul 7, 2022
780236f
Skip test_dummy_dataset_serialize_s3 on Windows when runned by GitHub…
albertvillanova Jul 7, 2022
0546772
Rename to GH Action Test to CI
albertvillanova Jul 7, 2022
699a95e
Skip test_dummy_dataset_serialize_s3 also on DatasetDict
albertvillanova Jul 7, 2022
ae3a426
Merge remote-tracking branch 'upstream/main' into ci-gh-actions
albertvillanova Jul 7, 2022
791b062
Clean file
albertvillanova Jul 7, 2022
3d18045
Skip test_changed_dataset_card on GH Actions
albertvillanova Jul 7, 2022
5b9451d
Force CI re-run
albertvillanova Jul 7, 2022
b41592c
Set checkout commit at PR head
albertvillanova Jul 7, 2022
d36662d
Revert "Set checkout commit at PR head"
albertvillanova Jul 7, 2022
c7f3e0e
Checkout all history for all branches
albertvillanova Jul 7, 2022
ed18815
Revert skip test_changed_dataset_card
albertvillanova Jul 7, 2022
6546153
Fix revert skip test_changed_dataset_card
albertvillanova Jul 8, 2022
3fecef7
Replace libsndfile1-dev with libsndfile1
albertvillanova Jul 8, 2022
2217769
Add sox as OS dependency on Linux
albertvillanova Jul 8, 2022
499f92d
Remove fail-fast policy to allow merging in red
albertvillanova Jul 12, 2022
f78e6ba
Remove old CircleCI
albertvillanova Jul 12, 2022
04eefe0
Remove old CircleCI deploy
albertvillanova Jul 12, 2022
d266caa
Clean GH Actions workflows dir
albertvillanova Jul 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 0 additions & 107 deletions .circleci/config.yml

This file was deleted.

81 changes: 0 additions & 81 deletions .circleci/deploy.sh

This file was deleted.

76 changes: 76 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: CI

on:
pull_request:
branches:
- main

env:
HF_SCRIPTS_VERSION: main
HF_ALLOW_CODE_EVAL: 1

jobs:

check_code_quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.6"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
run: |
black --check --line-length 119 --target-version py36 tests src benchmarks datasets metrics
isort --check-only tests src benchmarks datasets metrics
flake8 tests src benchmarks datasets metrics

test:
needs: check_code_quality
Copy link
Member Author

@albertvillanova albertvillanova Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_code_quality job must complete successfully before the test jobs will run.

This saves CI running time (billing and environmental impact).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would still run the tests even if the code quality check doesn't succeed. This will avoid having to review a first time to tell the user to fix code quality, and then later review a second time to fix the tests

Copy link
Member Author

@albertvillanova albertvillanova Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhoestq I'm not sure of understanding your point.

  • First, contributors should run tests locally and fix them locally.
  • If they forget, then the CI tells them there is a code quality issue, so that they can fix it by themselves, push and trigger the CI again (which once the code quality issue is fixed, will then run all the tests as well)
  • Normally, maintainers should review only green PRs (unless pingged for help): it's the contributor responsibility to make it pass the CI
  • And finally, I think (let me know if you do not think the same) a very common issue while contributing is running tests locally but forgetting running the code quality check (after multiple pushes): in these cases, it does not make sense to run all the tests (expensive) just because there is a non-used import, for example (that is not catch by make style, BTW, but for the more expensive make quality only); once the code quality issue is fixed, the new push will make all tests run anyway

Copy link
Member Author

@albertvillanova albertvillanova Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the specific use case you mention, the reviewer should say to the contributor: you should run locally both the code quality checks and the tests, before pushing

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
pyarrow_version: [latest, 6.0.1]
runs-on: ${{ matrix.os }}
steps:
- name: Install OS dependencies
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
sudo apt-get -y update
sudo apt-get -y install libsndfile1 sox
Copy link
Member Author

@albertvillanova albertvillanova Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous OS dependency libsndfile1-dev is replaced with libsndfile1.
The sox dependency is added (required by MP3 tests). This makes passing 2 more tests than before.

- uses: actions/checkout@v3
with:
fetch-depth: 0
Comment on lines +46 to +48
Copy link
Member Author

@albertvillanova albertvillanova Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note fetch-depth: 0 means to fetch all history commits for all branches and tags (instead of default, just the merge commit of current branch to main branch).

The reason for this is to be able to collect the test tests/test_dataset_cards.py::test_changed_dataset_card, because these are parametrized with the function get_changed_datasets and its implementation requires both current branch and origin/main branch: origin/main...HEAD

diff_output = check_output(["git", "diff", "--name-only", "origin/main...HEAD"], cwd=repo_path)

Otherwise, an error is raised:

_________________ ERROR collecting tests/test_dataset_cards.py _________________
tests/test_dataset_cards.py:53: in <module>
    @pytest.mark.parametrize("dataset_name", get_changed_datasets(repo_path))
tests/test_dataset_cards.py:35: in get_changed_datasets
    diff_output = check_output(["git", "diff", "--name-only", "origin/main...HEAD"], cwd=repo_path)
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/subprocess.py:356: in check_output
    **kwargs).stdout
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/subprocess.py:438: in run
    output=stdout, stderr=stderr)
E   subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', 'origin/main...HEAD']' returned non-zero exit status 128.

because of:

fatal: ambiguous argument 'origin/main...HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

- name: Set up Python 3.6
if: ${{ matrix.os == 'ubuntu-latest' }}
uses: actions/setup-python@v4
with:
python-version: 3.6
- name: Set up Python 3.7
if: ${{ matrix.os == 'windows-latest' }}
uses: actions/setup-python@v4
with:
python-version: 3.7
- name: Upgrade pip
run: python -m pip install --upgrade pip
- name: Pin setuptools-scm
if: ${{ matrix.os == 'ubuntu-latest' }}
run: echo "installing pinned version of setuptools-scm to fix seqeval installation on 3.6" && pip install "setuptools-scm==6.4.2"
- name: Install dependencies
run: |
pip install .[tests]
pip install -r additional-tests-requirements.txt --no-deps
- name: Install latest PyArrow
if: ${{ matrix.pyarrow_version == 'latest' }}
run: pip install pyarrow --upgrade
- name: Install PyArrow ${{ matrix.pyarrow_version }}
if: ${{ matrix.pyarrow_version != 'latest' }}
run: pip install pyarrow==${{ matrix.pyarrow_version }}
- name: Test with pytest
run: |
python -m pytest -n 2 --dist loadfile -sv ./tests/
30 changes: 0 additions & 30 deletions .github/workflows/test-audio.yml

This file was deleted.

6 changes: 3 additions & 3 deletions tests/test_arrow_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3119,9 +3119,9 @@ def test_pickle_dataset_after_transforming_the_table(in_memory, method_and_param


@pytest.mark.skipif(
os.name == "nt" and os.getenv("CIRCLECI") == "true",
reason='On Windows CircleCI, it raises botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5555/test"',
) # TODO(QL): find what's wrong with CircleCI
os.name == "nt" and (os.getenv("CIRCLECI") == "true" or os.getenv("GITHUB_ACTIONS") == "true"),
reason='On Windows CircleCI or GitHub Actions, it raises botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5555/test"',
) # TODO: find what's wrong with CircleCI / GitHub Actions
Comment on lines +3122 to +3124
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in CircleCI, this test must be skipped in GitHub Actions as well to avoid an error.

@require_s3
def test_dummy_dataset_serialize_s3(s3, dataset):
mock_bucket = s3_test_bucket_name
Expand Down
6 changes: 3 additions & 3 deletions tests/test_dataset_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -665,9 +665,9 @@ def test_datasetdict_from_text_split(split, text_path, tmp_path):


@pytest.mark.skipif(
os.name == "nt" and os.getenv("CIRCLECI") == "true",
reason='On Windows CircleCI, it raises botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5555/test"',
) # TODO(QL): find what's wrong with CircleCI
os.name == "nt" and (os.getenv("CIRCLECI") == "true" or os.getenv("GITHUB_ACTIONS") == "true"),
reason='On Windows CircleCI or GitHub Actions, it raises botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5555/test"',
) # TODO: find what's wrong with CircleCI / GitHub Actions
Comment on lines +668 to +670
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@require_s3
def test_dummy_dataset_serialize_s3(s3, dataset):
dsets = DatasetDict({"train": dataset, "test": dataset.select(range(2))})
Expand Down