-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix(cloudflare): improve handling of missing records on update/delete #5604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(cloudflare): improve handling of missing records on update/delete #5604
Conversation
I think I'm struggling to understand how this resolve the linked issue. Was this tested on running cluster? |
Same for me :). From what I understand, it becomes more flexible:
The PR looks technically correct and LGTM. @Starttoaster Do you think you can test this PR in your CI script that create reliably this issue ? |
@ivankatliarchuk Thank you for the feedback! Sorry for the confusion—the title needs updating to better reflect the scope. This PR doesn't completely resolve the linked issue; rather, it improves the handling of missing records during update and delete operations, which reduces errors and provides clearer logging about what is actually happening. It helps make the provider more robust in some edge cases, but the underlying issue with resource discovery during simultaneous controller upgrades remains. I did not test this on a running cluster, as I don’t currently have access to one. The changes are covered by new and existing unit tests, but real-world validation would definitely be valuable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth to add extra tests
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
charts/external-dns/CHANGELOG.md
Outdated
### Fixed | ||
|
||
- Fixed the type of `.extraContainers` from `object` to `list` (array). ([#5564](https://github.com/kubernetes-sigs/external-dns/pull/5564)) _@svengreb_ | ||
- Improved Cloudflare provider error handling with graceful fallback for update/delete operations. ([#5604](https://github.com/kubernetes-sigs/external-dns/pull/5604)) _@andrewcharleshay_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We not updating helm chart, so not sure if this is requird
- Add blank line before heading in CHANGELOG.md (MD022) - Break long line in cloudflare.md (MD013)
/lgtm |
listZonesError error | ||
zoneDetailsError error | ||
listZonesContextError error | ||
dnsRecordsError error | ||
createError error | ||
updateError error | ||
deleteError error | ||
customHostnamesError error | ||
regionalHostnamesError error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most likely a preffered approach is to create mockCfError
and
type mockCloudFlareClient struct {
...
mockCfError
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
type mockCloudFlareClient struct {
errors map[error]int # or anything similar
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listZonesError, zoneDetailsError, listZonesContextError, and dnsRecordsError were preexisting. If you want an errors
object to encapsulate all of them, then it would require a decent change to existing code that isn't part of this PR. I think that should be part of a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is too blunt and should be avoided. Many of these errors could simply be replaced with "magic values." While including some errors is acceptable, there's no need to include so many.
Usually this is done in a way, instead of
if m.createError != nil {
return cloudflare.DNSRecord{}, m.createError
}
something like
if some.Id == "existing-record-failed-creation" {
return cloudflare.DNSRecord{}, fmt.Errorf("failed to create record")
}
or
if m.errorFunc() != nill
return .., m.errorFunc()
Not necessary need to refactor what is already there, just let's not make things more complicated.
What we could have as well
-
Group related errors in a struct:
Create a nested struct for error types, e.g., Errors struct { ... }, to keep the main struct cleaner. -
Use a map for error injection:
Use a map[string]error where the key is the operation name, making it easier to add new error types without changing the struct. -
Refactor with interfaces:
Use interfaces and custom mock implementations for more complex or dynamic error simulation.
This two errors not even in use
customHostnamesError error
regionalHostnamesError error
listZonesError error | ||
zoneDetailsError error | ||
listZonesContextError error | ||
dnsRecordsError error | ||
createError error | ||
updateError error | ||
deleteError error | ||
customHostnamesError error | ||
regionalHostnamesError error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is too blunt and should be avoided. Many of these errors could simply be replaced with "magic values." While including some errors is acceptable, there's no need to include so many.
Usually this is done in a way, instead of
if m.createError != nil {
return cloudflare.DNSRecord{}, m.createError
}
something like
if some.Id == "existing-record-failed-creation" {
return cloudflare.DNSRecord{}, fmt.Errorf("failed to create record")
}
or
if m.errorFunc() != nill
return .., m.errorFunc()
Not necessary need to refactor what is already there, just let's not make things more complicated.
What we could have as well
-
Group related errors in a struct:
Create a nested struct for error types, e.g., Errors struct { ... }, to keep the main struct cleaner. -
Use a map for error injection:
Use a map[string]error where the key is the operation name, making it easier to add new error types without changing the struct. -
Refactor with interfaces:
Use interfaces and custom mock implementations for more complex or dynamic error simulation.
This two errors not even in use
customHostnamesError error
regionalHostnamesError error
It's not super clear to me how this pull request addresses the Issue I created at all. Unless I'm mistaken, this pull request seems irrelevant. And if so, I'd like it to dereference my Issue as it gives the false impression that the Issue is being addressed. The problem was where External DNS incorrectly decided to delete records; not an issue where it failed to delete or update a record. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What does it do ?
Addresses #5603
Motivation
This PR improves the robustness of the Cloudflare provider in external-dns by handling missing DNS records more gracefully during update and delete operations:
When attempting to update a DNS record that does not exist, the provider now falls back to creating the record instead of failing.
When attempting to delete a DNS record that does not exist, the provider now logs a warning and treats it as a successful (no-op) operation rather than an error.
These changes prevent unnecessary failures and improve recovery in scenarios where records may have been removed or are temporarily missing, such as during controller upgrades or resource reconciliation. Additional tests are included to verify these behaviors.
More