Skip to content

Conversation

youngnick
Copy link
Contributor

What type of PR is this?
/kind gep

What this PR does / why we need it:
This adds the design rationale and API design for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.

Which issue(s) this PR fixes:

Updates #1494

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2025
Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

//
// +unionDiscriminator
// +kubebuilder:validation:Enum=HTTP;GRPC
ExtAuthProtocol string `json:"protocol"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legitimate question, not a suggestion: do we need both protocols? I would think there would be two reasons to support both:

  1. Implementations can only support one, in which case I would expect there to maybe be a feature per type or something?
  2. Ext_auth servers only support one or the other

If neither of these I would think we would just support gRPC which is more powerful(?).

Even if one of these is true, I still wonder if we want both. Essentially we already have a fairly tricky GEP, introducing an external authorization API, and are putting two completely distinct protocols into it. If 2 protocols, why not 3 or 4 (this is rhetorical, lets please not do that!! 🙂 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for me is that the gRPC mode is only supported on Envoy, while at least a couple of other dataplanes (Traefik and HAProxy) can support the HTTP semantics. Doing this in this way means that dataplanes that don't support gRPC can make do with HTTP. It also allows for great use of extension servers, since they don't only need to support ext_auth. These two specific protocols are also included in Envoy's ext_auth filter anyway, so Envoy-based implementations can do either.

Probably does make sense to include a feature for each protocol, that's a good idea, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with folks I have heard that authz-server implementations also often use HTTP since its simpler. So seems both (1) and (2) are true.

I did also notice there is seemingly no spec at all for HTTP authz server... maybe we need to step up to fill in that gap (unless I just missed it, in which case we should link to it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absolutely needs a spec. Working on it. 😐

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look to moving ext_authz specifications to a proxy neutral location. One option that has been suggested is https://github.com/cncf/xds (yes, ext_authz isn't xDS but it is at proto-level in a related family and CNCF is as good a home as any). This has also come up for ext_proc.

I had the same reaction as @howardjohn at first. I think I'm leaning towards both HTTP / gRPC as discussed here, but I'd point out that one of the goals of standardization is to create a portable ecosystem. E.g. I can swap out Envoy for Traefik and things should continue to work. This won't happen if the ext_authz server is gRPC-based and Traefik only supports HTTP. I wonder if HTTP should be core and gRPC extended, since HTTP seems the lowest common denominator. This would unfortunately have the side effect of encouraging growth in the less optimal HTTP ext_authz protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch Completely agreed both that the spec needs to move to a neutral home, and that HTTP should be core with gRPC extended. I'm not completely sold on cncf/xds, but I'm not completely opposed either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, both HTTP and gRPC are extended, with features to mark their support. We can revisit if we make either of them Core at a later date, as part of graduation discussion. I'd rather not try and make this decision now (we're making enough already).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking as a Gateway API end user (hopefully it's OK to provide feedback here), HTTP based external authorization is important to us because it's what the software we use already supports.

For example, it's possible to share an oauth2-proxy instance and App Registration between a bunch of applications/serverless functions that have a common user base using ingress-nginx external auth annotations. The alternative is each one getting its own oauth2-proxy instance running in proxy mode. Not an absolute dealbreaker but we would need to go from a few oauth2-proxy pods currently to several dozen + a way to spin them up automatically.

Of the major external SSO providers / authenticating proxies I know of (oauth2-proxy, authentik, authelia. keycloak), I don't think any support GRPC external authentication? On the ingress side ingress-nginx and Traefik Forward Auth only support HTTP. For custom auth use cases I think developers are generally a lot more familiar with HTTP than GRPC and find it easier to work with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that feedback, @cccs-abwolfe, that's really useful. I think I'll leave both as options in here for now.

// The backends must speak the selected protocol (GRPC or HTTP) on the
// referenced port.
//
// If the backend service requires TLS, use BackendTLSPolicy to tell the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting any changes here, just a comment: this has interesting implications as we consider filter vs policy. Any backend policies, not just TLS, could clear apply here. But anything that is a filter could not, as there is no way to insert a filter.

So 1 point in favor of policies I suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio takes a slightly different approach to the comment and the PR:

  • Users define extension servers at a top level. This includes the address, common config, etc
  • Policies reference the extension, and that is all they need to know
  • Traffic level controls for how to reach it (like TLS) are handled by standard policies modifying Service/backends/etc.

Its good and bad. Its nice for a user to just be able to say "I want keycloak" and not need to configure the details of keycloak over and over. But if you just have 1 usage, then that abstract is overhead.

That being said, I do think that re-creating each policy on each extension is not likely a great pattern long term as we will eventually have many more backend policies (first and third party)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my thought as well. For example, Envoy Gateway has a standard stanza that is used to control connections to any backend, setting connection timeouts, pooling behaviors, load balancing behaviors, etc, that is included in each SecurityPolicy. I don't want this Filter to end up including lots of backend settings that will also end up in Backend controls in other ways.

// included field is intended.
//
// +optional
GRPCAuthConfig *GRPCAuthConfig `json:"grpc,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely the validation would come in the real Go change but I would assume we need to validate you don't set grpc+http and that it matches the protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's only for UX, not functionality, because using a union discriminator on the enum does the same thing - if you put HTTP config in the struct when you choose GRPC, then it has no effect.


Using a Filter also includes ordering (because Filters are an ordered list),
although this exact behavior is currently underspecified. This change will also
need to clarify. Ordering is particularly important for Auth use cases, because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree ordering is important for auth, I am not certain making it a filter meaningfully solves this. It seems like you pretty much always want auth before any of the current filters, and its really only the ordering amongst other 3rd party extensions that is important, which may likely not even be filters. The one thing that makes sense among the core filters, IMO, is ordering wrt CORS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this too, until people insisted I build it otherwise on Contour. People sometimes want to transform headers in some way before auth, for example, usually to fix broken things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a filter gives us a way to define this semantic, where Policy does not - without adding even more complexity into Policy Attachment like ordering or something.

// The backends must speak the selected protocol (GRPC or HTTP) on the
// referenced port.
//
// If the backend service requires TLS, use BackendTLSPolicy to tell the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focusing on TLS - I think thats already becoming slightly complex/messy that you are unable to define that your ext auth server is requiring TLS in the same config where you define it.

I'd expect an API that lets me defines the extension server, whether I use TLS to connect, faliureMode(open/close) in the same config. BackendTLSPolicy can potentially override that wrt to TLS?

//
// +optional
// +kubebuilder:validation:MaxLength=64
AllowedRequestHeaders []string `json:"allowedHeaders,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of configuration options in ext_authz.proto. In the proposal, a few have been selected, e.g. alowed request headers (but not disallowed?), body settings, etc. but not other features (e.g. clearing route cache to influence service selection - very Envoy specific but also a very useful capability). What rubric is used to determine which capabilities to include and exclude?

Copy link
Contributor Author

@youngnick youngnick Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to keep it to "things I've definitely seen people ask for that also exist across multiple implementations". Part of the graduation requirement is for multiple implementations, preferably with multiple dataplanes, to implement this, so it's important that we try to keep it both small to start with and as portable as possible.

We can definitely add more stuff when people need it, this is me trying to pick a minimum viable feature set.

Edit: Also important to remember this is an experimental API, so we can (and should) add more to it for common use cases, as long as they fit the above requirements.

@kflynn
Copy link
Contributor

kflynn commented Jun 30, 2025

After @howardjohn's entirely appropriate gentle mockery for the lack of proper specification for this protocol, I've abused the GEP machinery a bit to get the doc off my laptop into a state and place where at least other folks can read it: #3892 (or https://deploy-preview-3892--kubernetes-sigs-gateway-api.netlify.app/geps/gep-9999/ if you just want the formatted version).

(As stated in that document, I don't think that Gateway API is actually the correct place for this -- it was just a low-friction way to let others see it.)

This commit adds the design rationale and API design
for phase 1 of the Auth GEP, adding a Filter to HTTPRoute.

Signed-off-by: Nick Young <[email protected]>
HTTPRouteExtAuthGRPCProtocol HTTPRouteExtAuthProtocol = "GRPC"
HTTPRouteExtAuthHTTPProtocol HTTPRouteExtAuthProtocol = "HTTP"
)
// HTTPExtAuthFilter defines a filter that modifies requests by sending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we get consistent interop with features like frontend mTLS? Ideally we can present the frontend mTLS extracted identity to the ext_authz server. The ext_authz protocol has a notion of attribute context, will this be required as part of compliance across proxy implementations? I thought I had opened a thread on this but don't see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a useful use case, and we should look at adding the ability to say "populate X-Forwarded-Client-Cert if a client cert was used" - but I think we can come back and add that later. I agree that users will definitely want it, but I would like to get super basic functionality in first, and add things as users ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be inevitably needed for a solution combining these features but yeah, it's fine if you want to downscope for now, SG.

@youngnick
Copy link
Contributor Author

Okay, reviewing this, I think I've answered all the outstanding things, it seems that this is ready to move to Implementable, so I'll remove the hold and wait for an LGTM from someone. Next steps will be for me to actually add the APIs to the Go types and then move to Experimental. But I can do that after 22 July.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2025
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts are mostly: what are the pros/cons of ONLY doing phase 1 for now, and allowing the policy approach be its own separate enhancement later?

Comment on lines +137 to +142
* We introduce a new Policy object that can be targeted at either the
Gateway or HTTPRoute levels. In either of these cases, it _defaults_ the settings
for the HTTPRoute Filter across all HTTPRoute matches that roll up to the object.

These two parts will be done in two separate changes - Filter first, then
Policy after.
Copy link
Member

@shaneutt shaneutt Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngnick I'm curious whether you considered doing just the filter for the first iteration (and not including the policy components)? Would that provide enough value to move forward, while also keeping the scope really tight so we can hopefully get some velocity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan, yes. I don't think I would have been able to get this moving without a plan for the Policy though.

Copy link
Member

@shaneutt shaneutt Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose what I meant was, should we pull OUT the policy for now. However I didn't mean for this to be a blocking comment and it's going to Experimental so I'm OK with seeing where this goes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the first round of API changes to be included in this release will NOT include the Policy. That will happen in release 1.5 or later.

@robscott
Copy link
Member

Thanks @youngnick! Will leave hold to give you a chance to reply to @shaneutt's comment above.

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2025
@youngnick
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit f0ae2cc into kubernetes-sigs:main Jul 23, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants