Skip to content

Investigate the build issues, focusing on tests #1471

@TomFinley

Description

@TomFinley

At the time of writing our build system is plagued by a large number of failing tests and other build issues. This impacts our agility since an otherwise valid PR can not pass the test checks for spurious reasons that have nothing to do with the change. It also in turn leads to significant wastage of resources. The goal would be to improve the test error rate.

However, we are vexed somewhat by a lack of information on why these test failures are occurring. In particular, trying to reproduce test failures locally has, at least in my experience, very limited success. For example, in my own investigation into the random failures of MulticlassTreeFeaturizedLRTest on MacOS debug, I was only able to achieve a test failure twice out of some hundreds of runs on a Macbook, and what information I was able to gather was limited.

In the seeming absence of the ability to reliably produce test failures outside of the build machines, we need more information.

  1. Publish the tests logs as an artifact of the build so that we can gather more information. Random build failures: Publish the test logs #1473.

  2. Make the error messages from tests, when they do occur, contain some actually useful information. Random build failures: Make test failure output on numerical comparisons semi-useful #1477.

  3. Create a catalog of failures that occur in builds that in principle should have succeeded. (E.g., builds of master.) This is partially to validate the assumption that tests are the primary problem, as well as to get a sense of what tests are problematic. Random build failures: Catalog the failures #1474.

The preceding is purely information gathering, but at the same time there are some positive steps that can be taken, pending the above.

  1. We already know of some troublesome tests. These should be investigated for the "usual suspects," e.g., failure to set random seeds to a fixed value, having a variable number of threads in training processes, etc. (Which are known, but innocent, sources of run to run variance.)

  2. That the tests seem to fail so readily on the build machines yet are vexingly difficult to make fail locally suggests that there is something about the build environment that is different -- perhaps a different architecture or performance characteristics raise issues or race conditions that are simply not observed on our more performant developer machines. It may therefore be worthwhile to try to get the test environment machines reproduced exactly (down to the environment, processor, memory, everything) to see if that shows any clues.

  3. Most vague, but still useful, the nature of the failures, while mysterious, have not been entirely devoid of clues as to potential causes. I may write more about them in a comment later.

/cc @Zruty0 @eerhardt

Metadata

Metadata

Assignees

Labels

BuildBuild related issuebugSomething isn't workingtestrelated to tests

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions