Skip to content

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Jun 27, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

LoadBalancers take many minutes to become available after creation. We currently wait synchronously for a loadbalancer to become available immediately after creation. This increases total cluster installation time when creating multiple control plane loadbalancers as we don't create a loadbalancer until the previous one is fully initialised. It would be more time-efficient to create them all first, then wait for them to become available.

This PR splits loadbalancer creation into 2 phases:

  • Get or create
  • Reconcile

We ensure that we do the waiting in the reconcile phase after all getOrCreates have executes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5568

Special notes for your reviewer:

This PR doesn't introduce any parallel execution in CAPA. It simply changes the order of operations so that loadbalancer initialisations can proceed concurrently in AWS.

Apart from a minor refactor to enable re-use, the existing ensure two load balancers are reconciled test remains unchanged and continues to pass.

We add a new test, ensure two load balancers are created concurrently. This is also a general creation test, which did not previously exist. Concurrency is asserted by asserting that both WaitUntilLoadBalancerAvailableWithContext calls happen after both CreateLoadBalancer calls.

We don't add a test for classic load balancers, as support for these already seems to be problematic. Note that 2 classic loadbalancers cannot work because getOrCreateClassicLoadBalancer (formerly reconcileClassicLoadBalancer) does not take an lbSpec argument, therefore it hardcodes the primary load balancer. You could presumably mix a classic primary and v2 secondary LB, but I did not want to add coverage for an esoteric test case.

I believe I also spotted a latent bug in classic load balancer reconciliation where it will always reconcile a HealthCheck even if it is already set. I added a comment about this but did not attempt to fix it because it is unrelated.

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Control plane load balancers are created concurrently, reducing cluster
installation time when specifying a secondary control plane load balancer.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 27, 2025
@k8s-ci-robot k8s-ci-robot requested review from dlipovetsky and nrb June 27, 2025 10:18
@k8s-ci-robot k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 27, 2025
@mdbooth mdbooth changed the title Create multiple control plane loadbalancers concurrently ✨Create multiple control plane loadbalancers concurrently Jun 27, 2025
@nrb
Copy link
Contributor

nrb commented Jul 1, 2025

We don't add a test for classic load balancers, as support for these already seems to be problematic.

Correct. We're phasing out support of classic load balancers as much as possible.

You could presumably mix a classic primary and v2 secondary LB, but I did not want to add coverage for an esoteric test case.

This was definitely possible a few releases ago, but since Go dropped older ciphers that classic ELBs supported, we've since defaulted to NLBs.

I added the support for a second LB, and supporting classic there was never the intention.

So I think this test coverage is sufficient, and the change is a pretty easy speed improvement.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cfffb0adac6701c3d65da975687cef7d94153bd2

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2025
@mdbooth
Copy link
Contributor Author

mdbooth commented Jul 1, 2025

This is still running after 33 hours! Wonder if we can kick it somehow.

/test pull-cluster-api-provider-aws-build-docker

@mdbooth
Copy link
Contributor Author

mdbooth commented Jul 2, 2025

/test pull-cluster-api-provider-aws-build-docker

@k8s-ci-robot k8s-ci-robot merged commit b3a6721 into kubernetes-sigs:main Jul 2, 2025
27 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v2.8 milestone Jul 2, 2025
@mdbooth mdbooth deleted the issue-5568 branch July 2, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel creation of multiple loadbalancers
3 participants