Skip to content

Conversation

AndrewCharlesHay
Copy link
Contributor

@AndrewCharlesHay AndrewCharlesHay commented Jul 2, 2025

What does it do ?

Addresses #5603

Motivation

This PR improves the robustness of the Cloudflare provider in external-dns by handling missing DNS records more gracefully during update and delete operations:

When attempting to update a DNS record that does not exist, the provider now falls back to creating the record instead of failing.
When attempting to delete a DNS record that does not exist, the provider now logs a warning and treats it as a successful (no-op) operation rather than an error.
These changes prevent unnecessary failures and improve recovery in scenarios where records may have been removed or are temporarily missing, such as during controller upgrades or resource reconciliation. Additional tests are included to verify these behaviors.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. provider Issues or PRs related to a provider labels Jul 2, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 2, 2025
@ivankatliarchuk
Copy link
Contributor

I think I'm struggling to understand how this resolve the linked issue. Was this tested on running cluster?

@mloiseleur
Copy link
Collaborator

Same for me :). From what I understand, it becomes more flexible:

  1. switch from update to create on not found records
  2. not enforce deletion on not found records

The PR looks technically correct and LGTM.

@Starttoaster Do you think you can test this PR in your CI script that create reliably this issue ?
(See CONTRIBUTING doc if you need help on how to test a PR).

@AndrewCharlesHay AndrewCharlesHay changed the title fix(cloudflare): remove unneeded deletion and recreation fix(cloudflare): improve handling of missing records on update/delete Jul 3, 2025
@AndrewCharlesHay
Copy link
Contributor Author

I think I'm struggling to understand how this resolve the linked issue. Was this tested on running cluster?

@ivankatliarchuk Thank you for the feedback! Sorry for the confusion—the title needs updating to better reflect the scope. This PR doesn't completely resolve the linked issue; rather, it improves the handling of missing records during update and delete operations, which reduces errors and provides clearer logging about what is actually happening. It helps make the provider more robust in some edge cases, but the underlying issue with resource discovery during simultaneous controller upgrades remains.

I did not test this on a running cluster, as I don’t currently have access to one. The changes are covered by new and existing unit tests, but real-world validation would definitely be valuable.

Copy link
Contributor

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth to add extra tests

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

### Fixed

- Fixed the type of `.extraContainers` from `object` to `list` (array). ([#5564](https://github.com/kubernetes-sigs/external-dns/pull/5564)) _@svengreb_
- Improved Cloudflare provider error handling with graceful fallback for update/delete operations. ([#5604](https://github.com/kubernetes-sigs/external-dns/pull/5604)) _@andrewcharleshay_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We not updating helm chart, so not sure if this is requird

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2025
- Add blank line before heading in CHANGELOG.md (MD022)
- Break long line in cloudflare.md (MD013)
@ivankatliarchuk
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2025
Comment on lines +58 to +66
listZonesError error
zoneDetailsError error
listZonesContextError error
dnsRecordsError error
createError error
updateError error
deleteError error
customHostnamesError error
regionalHostnamesError error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely a preffered approach is to create mockCfError and

type mockCloudFlareClient struct {
    ...
    mockCfError
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

type mockCloudFlareClient struct {
    errors map[error]int # or anything similar
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listZonesError, zoneDetailsError, listZonesContextError, and dnsRecordsError were preexisting. If you want an errors object to encapsulate all of them, then it would require a decent change to existing code that isn't part of this PR. I think that should be part of a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is too blunt and should be avoided. Many of these errors could simply be replaced with "magic values." While including some errors is acceptable, there's no need to include so many.

Usually this is done in a way, instead of

if m.createError != nil {
		return cloudflare.DNSRecord{}, m.createError
	}

something like

if some.Id == "existing-record-failed-creation" {
      return cloudflare.DNSRecord{}, fmt.Errorf("failed to create record")
}

or

if m.errorFunc() != nill
   return .., m.errorFunc()

Not necessary need to refactor what is already there, just let's not make things more complicated.

What we could have as well

  1. Group related errors in a struct:
    Create a nested struct for error types, e.g., Errors struct { ... }, to keep the main struct cleaner.

  2. Use a map for error injection:
    Use a map[string]error where the key is the operation name, making it easier to add new error types without changing the struct.

  3. Refactor with interfaces:
    Use interfaces and custom mock implementations for more complex or dynamic error simulation.

This two errors not even in use

customHostnamesError   error
regionalHostnamesError error

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2025
Comment on lines +58 to +66
listZonesError error
zoneDetailsError error
listZonesContextError error
dnsRecordsError error
createError error
updateError error
deleteError error
customHostnamesError error
regionalHostnamesError error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is too blunt and should be avoided. Many of these errors could simply be replaced with "magic values." While including some errors is acceptable, there's no need to include so many.

Usually this is done in a way, instead of

if m.createError != nil {
		return cloudflare.DNSRecord{}, m.createError
	}

something like

if some.Id == "existing-record-failed-creation" {
      return cloudflare.DNSRecord{}, fmt.Errorf("failed to create record")
}

or

if m.errorFunc() != nill
   return .., m.errorFunc()

Not necessary need to refactor what is already there, just let's not make things more complicated.

What we could have as well

  1. Group related errors in a struct:
    Create a nested struct for error types, e.g., Errors struct { ... }, to keep the main struct cleaner.

  2. Use a map for error injection:
    Use a map[string]error where the key is the operation name, making it easier to add new error types without changing the struct.

  3. Refactor with interfaces:
    Use interfaces and custom mock implementations for more complex or dynamic error simulation.

This two errors not even in use

customHostnamesError   error
regionalHostnamesError error

@Starttoaster
Copy link

It's not super clear to me how this pull request addresses the Issue I created at all. Unless I'm mistaken, this pull request seems irrelevant. And if so, I'd like it to dereference my Issue as it gives the false impression that the Issue is being addressed. The problem was where External DNS incorrectly decided to delete records; not an issue where it failed to delete or update a record.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. docs needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. provider Issues or PRs related to a provider size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants