Skip to content

Operation in multi region clusters #4349

@james-callahan

Description

@james-callahan

Describe the feature you are requesting
We have k8s workers in multiple AWS regions. The AWS load balancer controller should at least not break when that is the case.

Motivation
At the moment, once we have a worker from another region in our cluster, AWS load balancer controller fails to maintain the target groups, logging this in a loop:

{"level":"error","ts":"2025-09-23T05:59:20Z","msg":"Requesting network requeue due to error from ReconcileForNodePortEndpoints","tgb":{"name":"k8s-traefik-traefik-e9678d08bb","namespace":"traefik"},"error":"operation error EC2: DescribeInstances, https response error StatusCode: 400, RequestID: 28fa5b8e-d578-4153-8044-bb0feb3d39c3, api error InvalidInstanceID.NotFound: The instance ID 'i-0c7d8b13101f076cf' does not exist"}
{"level":"error","ts":"2025-09-23T05:59:20Z","msg":"Reconciler error","controller":"targetGroupBinding","controllerGroup":"elbv2.k8s.aws","controllerKind":"TargetGroupBinding","TargetGroupBinding":{"name":"k8s-traefik-traefik-e9678d08bb","namespace":"traefik"},"namespace":"traefik","name":"k8s-traefik-traefik-e9678d08bb","reconcileID":"8c1edc9e-fa43-4064-92c0-b13ddf9ead8d","error":"operation error EC2: DescribeInstances, https response error StatusCode: 400, RequestID: 28fa5b8e-d578-4153-8044-bb0feb3d39c3, api error InvalidInstanceID.NotFound: The instance ID 'i-0c7d8b13101f076cf' does not exist"}

Describe the proposed solution you'd like

Option 1.

Be able to provide aws-load-balancer-controller with a node label to use as a subset of nodes to watch/maintain. That way I can e.g. provide --node-labels=topology.kubernetes.io/region=us-east-1 and have the controller work on a subset of nodes (and e.g. run multiple controllers; one for each region)

Option 2.

When parsing the nodeID at https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/fee9ed8a31a314752d80128ebd9b25c8003a9e5c/pkg/k8s/node_utils.go#L29C2-L29C17, split out the region and check it for eligibility.
I note that providerID looks like e.g. aws:///us-east-1d/i-045c4ea06a4a3e01e.

Describe alternatives you've considered

Stop using load-balancer-controller

Contribution Intention (Optional)

-[ ] Yes, I am willing to contribute a PR to implement this feature
-[ ] No, I cannot work on a PR at this time

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions