Skip to content

Conversation

@Vincent056
Copy link
Contributor

@Vincent056 Vincent056 commented Jul 2, 2024

This PR adds the ability to run aide 0.18 config checks and migration.

The aide0.18 will be compiled as an additional binary and used by FIO to check our migrated config. This new binary will reside in the container image, and we will launch a new pod to check the aide config migration every time we need to re-init.

Add config check to existing, upgrade case, check if migration checks needed annotation, if so we launch the pod, pod checks for aide migration and issue warning return result as a config map, the config map controller pick up the change, remove migration checks needed annotation.

We will try to perform the migration if a user-defined config is detected. We will issue warning messages in the log, and have a failed annotation key in the FIO instance if we cannot pre-migrate the config.

Annotation keys added:

	// AideConfigMigrationIgnoreAnnotationKey tells us to ignore the deprecated config options
	AideConfigMigrationIgnoreAnnotationKey = "file-integrity.openshift.io/migration-ignore-deprecation"
	// AideConfigMigrationFailedAnnotationKey tells us that the migration failed
	AideConfigAutoMigrationFailedAnnotationKey = "file-integrity.openshift.io/migration-failed"
	// AideConfigMigrationCheckDisabledAnnotationKey tells us that the migration check is disabled
	AideConfigMigrationCheckDisabledAnnotationKey = "file-integrity.openshift.io/migration-check-disabled"

example of event message:

7m17s       Warning   FileIntegrityAIDEConfigMigration   fileintegrity/example-fileintegrity            Migration check failed: Detected error config during the migration check: Invalid configureline error

AIDE Configuration Changes

  • Only contains things that are being deprecated or removed along with their potential replacement

Removed Features in AIDE v0.17

  • ignore_list

    • Attributes whose changes are ignored in the report.
  • report_attributes

    • Attributes always printed in the report for changed files. If an attribute is both ignored and forced, it is not considered for file change but will be printed in the final report if the file has otherwise changed.
  • verbose (type: number, range: 0 - 255, default: 5)

    • Removed, use log_level and report_level options instead.

New Features in AIDE v0.17

  • database_in (type: URL, default: see \fB--version\fP output, added in AIDE v0.17)
  • log_level (type: log level, default: \fBwarning\fR)
  • report_level (type: report_level, default: \fBchanged_attributes\fR)
    • Specifies the log level. Log messages are written to \fIstderr\fR. If there are multiple \fIlog_level\fR lines, the first one is used. The --log-level or -L command line option overwrites this option.

New Features in AIDE v0.16

  • report_ignore_added_attrs (type: attribute expression, default: empty)

    • Attributes whose addition is ignored in the report.
  • report_ignore_removed_attrs (type: attribute expression, default: empty)

    • Attributes whose removal is ignored in the report.
  • report_ignore_e2fsattrs (type: string, default: 0)

    • List (no delimiter) of ext2 file attributes to be ignored in the report.
  • report_force_attrs (type: attribute expression, default: empty)

    • Attributes always printed in the report for changed files.

Deprecated Features in AIDE v0.18 (to be removed in AIDE v0.20)

  • @@ifdef VARIABLE

    • Same as @@if defined VARIABLE.
  • @@ifndef VARIABLE

    • Same as @@if not defined VARIABLE.
  • @@ifhost HOSTNAME

    • Same as @@if hostname HOSTNAME.
  • @@ifnhost HOSTNAME

    • Same as @@if not hostname HOSTNAME.
  • Special attributes

    • S
      • Check for growing size. (DEPRECATED since AIDE v0.18, will be removed in AIDE v0.20)
      • Use growing+s attributes instead.

Removed Features in AIDE v0.19

  • database
    • The URL from which the database is read. Only one of these lines is allowed. If there are multiple database lines, the first is used.

@openshift-ci openshift-ci bot requested review from BhargaviGudi and mkumku July 2, 2024 14:14
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2024
@Vincent056 Vincent056 force-pushed the aide-migrate branch 4 times, most recently from 2a2f984 to 2e48413 Compare July 2, 2024 17:12
@Vincent056
Copy link
Contributor Author

/retest

@Vincent056 Vincent056 force-pushed the aide-migrate branch 5 times, most recently from 573bd4f to 2d23d2a Compare July 12, 2024 18:44
@Vincent056
Copy link
Contributor Author

Vincent056 commented Jul 12, 2024

we will save migrated config in:

kind: ConfigMap
apiVersion: v1
metadata:
  name: example-fileintegrity
  namespace: openshift-file-integrity
  uid: 3e68694f-fed1-403f-805f-813eec85ff38
  resourceVersion: '68362036'
  creationTimestamp: '2024-06-28T06:22:38Z'
  labels:
    file-integrity.openshift.io/aide-conf: ''
    file-integrity.openshift.io/owner: example-fileintegrity
  annotations:
    kubernetes.io/description: hi
  managedFields:
    - manager: Mozilla
      operation: Update
      apiVersion: v1
      time: '2024-07-12T06:37:37Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubernetes.io/description': {}
    - manager: file-integrity-operator
      operation: Update
      apiVersion: v1
      time: '2024-07-12T18:35:21Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:aide-0.18.conf': {}
          'f:aide.conf': {}
        'f:metadata':
          'f:labels':
            .: {}
            'f:file-integrity.openshift.io/aide-conf': {}
            'f:file-integrity.openshift.io/owner': {}
data:
  aide-0.18.conf: |-
    @@define DBDIR /hostroot/etc/kubernetes
    @@define LOGDIR /hostroot/etc/kubernetes
    database_in=file:@@{DBDIR}/aide.db.gz
    database_out=file:@@{DBDIR}/aide.db.gz.new
    gzip_dbout=yes
    log_level=notice
    report_level=added_removed_attributes
    report_url=file:@@{LOGDIR}/aide.log.new
    report_url=stdoutx
    PERMS = p+u+g+acl+selinux+xattrs
    CONTENTEX=sha512+ftype+p+u+g+n+acl+selinux+xattrs

    /hostroot/boot/        CONTENTEX
    /hostroot/root/\..* PERMS
    /hostroot/root/   CONTENTEX
    !/hostroot/root/\.kube
    !/hostroot/usr/src/
    !/hostroot/usr/tmp/

    /hostroot/usr/    CONTENTEX

    # OpenShift specific excludes
    !/hostroot/opt/
    !/hostroot/var
    !/hostroot/etc/NetworkManager/system-connections/
    !/hostroot/etc/mtab$
    !/hostroot/etc/.*~
    !/hostroot/etc/kubernetes/static-pod-resources
    !/hostroot/etc/kubernetes/test
    !/hostroot/etc/kubernetes/aide.*
    !/hostroot/etc/kubernetes/manifests
    !/hostroot/etc/kubernetes/kubelet-ca.crt
    !/hostroot/etc/docker/certs.d
    !/hostroot/etc/selinux/targeted
    !/hostroot/etc/openvswitch/conf.db
    !/hostroot/etc/kubernetes/cni/net.d
    !/hostroot/etc/kubernetes/cni/net.d/*
    !/hostroot/etc/machine-config-daemon/currentconfig$
    !/hostroot/etc/machine-config-daemon/node-annotation.json*
    !/hostroot/etc/pki/ca-trust/extracted/java/cacerts$
    !/hostroot/etc/cvo/updatepayloads
    !/hostroot/etc/cni/multus/certs
    !/hostroot/etc/kubernetes/compliance-operator
    !/hostroot/etc/kubernetes/node-feature-discovery

    # Catch everything else in /etc
    /hostroot/etc/    CONTENTEX
  aide.conf: |-
    @@define DBDIR /hostroot/etc/kubernetes
    @@define LOGDIR /hostroot/etc/kubernetes
    database=file:@@{DBDIR}/aide.db.gz
    database_out=file:@@{DBDIR}/aide.db.gz.new
    gzip_dbout=yes
    verbose=6
    report_url=file:@@{LOGDIR}/aide.log.new
    report_url=stdoutx
    PERMS = p+u+g+acl+selinux+xattrs
    CONTENT_EX = sha512+ftype+p+u+g+n+acl+selinux+xattrs

    /hostroot/boot/        CONTENT_EX
    /hostroot/root/\..* PERMS
    /hostroot/root/   CONTENT_EX
    !/hostroot/root/\.kube
    !/hostroot/usr/src/
    !/hostroot/usr/tmp/

    /hostroot/usr/    CONTENT_EX

    # OpenShift specific excludes
    !/hostroot/opt/
    !/hostroot/var
    !/hostroot/etc/NetworkManager/system-connections/
    !/hostroot/etc/mtab$
    !/hostroot/etc/.*~
    !/hostroot/etc/kubernetes/static-pod-resources
    !/hostroot/etc/kubernetes/test
    !/hostroot/etc/kubernetes/aide.*
    !/hostroot/etc/kubernetes/manifests
    !/hostroot/etc/kubernetes/kubelet-ca.crt
    !/hostroot/etc/docker/certs.d
    !/hostroot/etc/selinux/targeted
    !/hostroot/etc/openvswitch/conf.db
    !/hostroot/etc/kubernetes/cni/net.d
    !/hostroot/etc/kubernetes/cni/net.d/*
    !/hostroot/etc/machine-config-daemon/currentconfig$
    !/hostroot/etc/machine-config-daemon/node-annotation.json*
    !/hostroot/etc/pki/ca-trust/extracted/java/cacerts$
    !/hostroot/etc/cvo/updatepayloads
    !/hostroot/etc/cni/multus/certs
    !/hostroot/etc/kubernetes/compliance-operator
    !/hostroot/etc/kubernetes/node-feature-discovery

    # Catch everything else in /etc
    /hostroot/etc/    CONTENT_EX

@Vincent056
Copy link
Contributor Author

@rhmdnd this should be ready for some reviews

@Vincent056 Vincent056 force-pushed the aide-migrate branch 4 times, most recently from 24fed09 to 130dbd0 Compare July 12, 2024 19:22
Copy link
Contributor

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took an initial pass, but I still need to come back and look through the config map controller and file integrity controller logic.

Posting the feedback I have for now.

COPY . .

RUN git clone https://github.com/autoconf-archive/autoconf-archive.git && \
cp autoconf-archive/m4/*.m4 /usr/share/aclocal/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dependency for building AIDE 0.18?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think rhel might has rpm for this, but this builder does not

aide-0.18:
rm -rf aide && \
git clone https://github.com/aide/aide.git && \
cd aide && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a build directory where we stash other related tools like controller-gen and kustomize. We could keep the entire repository in there so it gets cleaned up using make clean. Otherwise, we might need to go in and clean it up manually if we encounter a build failure.

err = waitForAIDEConfigMigrationEvent(t, f, namespace, testName, "17")
if err != nil {
t.Errorf("Expected event to be found")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks for adding this. So the remaining bit to test would be running AIDE 0.18.0, yeah? At this point, the test is assertion the configuration should work with 0.18.0, but we're still running 0.16.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@Vincent056
Copy link
Contributor Author

Took an initial pass, but I still need to come back and look through the config map controller and file integrity controller logic.

Posting the feedback I have for now.

thanks for the detail reviews!

@Vincent056 Vincent056 force-pushed the aide-migrate branch 2 times, most recently from 03f6255 to a2ffd69 Compare July 22, 2024 06:39
@Vincent056 Vincent056 force-pushed the aide-migrate branch 2 times, most recently from 8b33a9f to e8d41fc Compare July 22, 2024 23:18
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2024
This PR adds ability to run aide 0.18 config checks and migration.

The aide0.18 will be complied as binary and used by FIO in a container, we
will try to see if we can perform the migration if a user defined config is being
detected. We will issue warning messages in the log, and have a failed annotation key
in the FIO instance if we are not able to pre migrate the config.
@Vincent056
Copy link
Contributor Author

/retest

@Vincent056 Vincent056 requested a review from rhmdnd July 23, 2024 01:08
@Vincent056
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

@Vincent056: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - a few more comments inline now that I've had more time to walk through the controller code and the tests.

for {
select {
case <-migrateCtx.Done():
DBG("Migration loop cancelled by the main routine!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case - has the operation been aborted due to an error? Can, or should, the user do anything to restart the migration?

return
} else if aideResult == 17 {
// This is an AIDE config line error.
newErr := fmt.Sprintf("Detected configuration error during the migration check for AIDE 0.18: %s, output: %s ", common.GetAideErrorMessage(aideResult), output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - this error moves us in the right direction I think by giving users the information they need to update AIDE configs. Curious to see how this renders in the logs with an actual AIDE error.

ReinitDaemonSetPrefix = "aide-ini"

// AideConfigMigrationIgnoreAnnotationKey tells us to ignore the deprecated config options
AideConfigMigrationIgnoreAnnotationKey = "file-integrity.openshift.io/migration-ignore-deprecation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For additional clarity, we're not ignoring potential FIO deprecations, but AIDE deprecations. We could update that in the annotation to be more explicit:

AideConfigMigrationIgnoreAnnotationKey = "file-integrity.openshift.io/migration-ignore-aide-deprecation"

func (r *ReconcileConfigMap) handleMigrationCheckLog(cm *corev1.ConfigMap, logger logr.Logger) (reconcile.Result, error) {
owner, err := common.GetConfigMapOwnerName(cm)
if err != nil {
logger.Error(err, "Malformed ConfigMap: Could not get owner. Cannot retry.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we need the user to go adjust the ConfigMap manually, right? Can we include the error in the log output, or does it become unruly?

delete(annotation, common.IntegrityLogErrorAnnotationKey)
}

delete(annotation, common.IntegrityMigrationUpdateAnnotationKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We delete this annotation because the migration has been successful?


RUN make build

RUN make aide-0.18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative we talked about earlier was to build this using a separate ubi8 base image container and then copying it over, too.

Or, at least in this case, we could just copy it out of the fedora 40 image and see how that works.

# install operator binary
COPY --from=builder /go/src/github.com/openshift/file-integrity-operator/build/bin/manager ${OPERATOR}
COPY build/bin /usr/local/bin
COPY --from=builder /go/src/github.com/openshift/file-integrity-operator/build/bin/aide-0.18 /usr/sbin/aide-0.18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we don't need to pollute the builder container image with utilities and code unrelated to golang, or building FIO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll also need to update the default AIDE configuration.

https://github.com/openshift/file-integrity-operator/blob/master/pkg/controller/fileintegrity/config_defaults.go#L15-L61

The way I understand the current logic is that FIO will continue to lay down AIDE configurations that are not compatible with 0.18.0. I think we'll want to change that as soon as possible, so long as the default AIDE configuration for 0.18.0 we want to use works on 0.16.0. That way we don't have the maintain the migration logic any longer than we absolutely have to.

return
}

// Append operatorPods to pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated change?

return nil
}

func waitForFIMigrationErrAnnotation(t *testing.T, f *framework.Framework, namespace, name, expectedMessage string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could use this as an assertion and not have to deal with checking the error output in the caller to fail the test, making it more of an assertion.

func assertAIDEMigrationEmitsAnAlert(...)

And then a similar one below:

func assertAIDEMigrationIsSuccessful(...)

But, I acknowledge this would be potentially different from existing conventions and we could do that in a separate PR if we decide to make that change across the entire functional test suite.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 23, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 22, 2024
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 23, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 23, 2024

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants