-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
What would you like to be added:
AWS provider should treat permission/authorization errors as hard errors instead of soft errors, causing external-dns to terminate and readiness probes to fail.
Currently, AWS permission errors (AccessDenied, UnauthorizedOperation) are wrapped as provider.SoftError, so the process continues running and /healthz stays healthy.
Why is this needed:
Permission errors are non-recoverable and require manual intervention.
In real-world setups (IRSA/Pod Identity, EC2 instance profiles), identity association changes often require a fresh pod/instance to pick up the new mapping/credentials. Failing fast accelerates that refresh path and surfaces the error to operators and monitoring.
Current behavior creates silent failures - DNS records aren't updated but the service appears healthy.
By failing fast on permission errors:
- Kubernetes restarts the pod automatically
- Operators get immediate feedback about misconfigurations
- Monitoring systems properly detect issues This aligns with "fail fast" principles for non-recoverable errors and improves operational visibility.
If the proposed direction makes sense, I’d be happy to open a PR.