Skip to content

Conversation

@hardys
Copy link

@hardys hardys commented Jul 22, 2019

Some of the underlying assumptions of the rhcos.Image abstraction make it difficult when dealing with a platform that requires different images for bootstrap and cluster machines:

  • The same image is used for bootstrap, masters, and workers
  • That image information is included in the generated machines manifests and also to terraform

In the case of the baremetal IPI platform, we use the QEMU image to launch a VM bootstrap node, then the OpenStack image is used to deploy masters via terraform and later workers via the baremetal-operator.

This PR doesn't modify the data consumed by terraform or the BMO for masters/workers yet, that is planned via follow-up PRs that complete #2060

This is the first step towards fixing #2037

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 22, 2019
@hardys hardys force-pushed the issue_2037 branch 2 times, most recently from 7ab054f to ecd3cf0 Compare July 22, 2019 17:26
@hardys
Copy link
Author

hardys commented Jul 22, 2019

cc @stbenjam - this part is working for me now, further PRs to enable bootstrap hosted Ironic (based on openshift-metal3/kni-installer#100 work) coming soon, but feedback welcome.

@hardys
Copy link
Author

hardys commented Jul 22, 2019

@russellb please could you add the platform/baremetal label as this is a prerequisite to bootstrap hosted Ironic ref #2060

@russellb russellb added the platform/baremetal IPI bare metal hosts platform label Jul 22, 2019
@russellb russellb requested a review from abhinavdahiya July 22, 2019 17:46
@stbenjam
Copy link
Member

Otherwise I think this looks great, will be happy to start seeing the image stuff disappear from the platform config.

@abhinavdahiya
Copy link
Contributor

Some of the underlying assumptions of the rhcos.Image abstraction make it difficult when dealing with a platform that requires different images for bootstrap and cluster machines:

* The same image is used for bootstrap, masters, and workers

* That image information is included in the generated machines manifests and also to terraform

The goal of the image in the machines is the Image type native to the platform, example AMI on AWS, Azure Image on Azure, GCP Image on GCP. the terraform is given a form of the boot-image that can be converted to this platform dependent image. This allows terraform to do the part of creating these image objects that later on in the future can be done by the cluster using the release-image when creating these machines.

@metal3ci
Copy link

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/915/

@hardys
Copy link
Author

hardys commented Jul 23, 2019

The goal of the image in the machines is the Image type native to the platform, example AMI on AWS, Azure Image on Azure, GCP Image on GCP ...

Ack, understood - it's just that this single-image abstraction won't work with the current state of baremetal IPI support - perhaps this is something of a special case, but I think the only alternative would be to get a new RHCOS build which supports both Ironic config drive partitions, and qemu fw_cfg driver for Ignition, I'm not sure if that is viable based on previous discussions on this topic.

I'll address the other comments and hopefully we can make progress with this approach for now, thanks for the prompt review! :)

@hardys
Copy link
Author

hardys commented Jul 23, 2019

Review comments addressed, thanks for the feedback!

@metal3ci
Copy link

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/918/

@hardys
Copy link
Author

hardys commented Jul 23, 2019

CI failure looks unrelated AFAICS

No public route53 zone found matching name \"origin-ci-int-aws.dev.rhcloud.com\"

It seems we've got some DNS issues impacting the AWS jobs atm, will retest when those are confirmed resolved.

hardys pushed a commit to hardys/rhcos-downloader that referenced this pull request Jul 23, 2019
The URL provided internally by the installer is the full path
to the openstack qcow file, and the baseURI is supposed to be an
internal detail, so we need to handle this case for the planned
switch to use installer generated image references:

openshift/installer#2037
openshift/installer#2061

The previous baseURI method is maintained for backwards compatibilty
and can be removed later when the installer changes are complete,
and at that point we probably want to remove the latest symlinks
as I think we need to deal with explicit image references in the
providerSpec, then have the BMO (and terraform) look at the mirror
location for the locally downloaded/compressed version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: the openstack image cannot be used to boot a VM using QEMU in libvirt backend? Is there a technical limitation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavdahiya The reason is that the libvirt backend uses the qemu_fw_cfg driver to pass the Ignition config, and the only image which has this enabled is the QEMU one:

https://github.com/openshift/installer/blob/master/pkg/terraform/exec/plugins/vendor/github.com/dmacvicar/terraform-provider-libvirt/libvirt/domain.go#L243

It might be that in future we can rework that to enable passing the config via a config-drive so that we can use the openstack image in both places, but I've not investigated that yet, or had any discussion with the terraform-provider-libvirt community

@stbenjam
Copy link
Member

/retest

@stbenjam
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2019
@stbenjam
Copy link
Member

/test e2e-openstack

@hardys
Copy link
Author

hardys commented Jul 25, 2019

/assign @staebler

@hardys
Copy link
Author

hardys commented Jul 26, 2019

@abhinavdahiya Thanks for the previous review, comments addressed so would appreciate another review when you get a moment, anything else which needs addressing before this can merge?

@abhinavdahiya
Copy link
Contributor

/unassign @staebler

@abhinavdahiya
Copy link
Contributor

For the baremetal platform we require two different bootimages,
the QEMU one for the libvirt based bootstrap VM, and the OpenStack
one that contains the necessary Ironic config drive support to pass
data to ignition.

So we rework the Image abstraction by adding a new BootstrapImage
type/asset, which returns the same as Image in all cases except for
the baremetal platform.

This also aligns with the tfvars renamed in openshift#2044 and allows us to
pass the rhcos.QEMU image via BootstrapImage to terraform but leaves
the OpenStack image URL available for future use to deploy masters
via follow-up PRs that implement issue openshift#2060 and also correctly set
the worker machineset providerSpec for the baremetal-operator.

Related: openshift#2037
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 29, 2019
@stbenjam
Copy link
Member

@abhinavdahiya I've updated the graph (@hardys is on PTO, so watching these PR's for him)

@russellb
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2019
@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/958/

@stbenjam
Copy link
Member

/test e2e-aws

@stbenjam
Copy link
Member

@abhinavdahiya This is ready for another look, I've updated the dep graph.

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, hardys, russellb, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit d89a59f into openshift:master Jul 31, 2019
@openshift-ci-robot
Copy link
Contributor

@hardys: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 44fea9e link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. platform/baremetal IPI bare metal hosts platform size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants