Higress v2.2.0 无法更新 Gateway 状态:status.addresses 缺少 type 字段导致校验失败(AWS ELB 主机名场景

View original issue on GitHub  ·  Variant 1

Higress v2.2.0 Fails to Update Gateway Status with AWS ELB Hostnames

In AWS EKS environments, using Higress v2.2.0 alongside Gateway API v1.4.0 can lead to a situation where the Higress controller fails to correctly update the status of a Gateway resource. This issue manifests as the Gateway remaining in an Unknown state, accompanied by error messages in the controller logs.

The core problem lies in how Higress handles external addresses, specifically when the Gateway is exposed via an AWS Elastic Load Balancer (ELB) hostname. The controller attempts to write the ELB's hostname to the status.addresses field of the Gateway resource. However, it only provides the value of the address (the hostname itself) without specifying the type as Hostname, which is a requirement of the Gateway API v1.4.0 specification. This omission causes a validation failure by the Kubernetes API server, preventing the status update.

Root Cause Analysis

The root cause stems from a discrepancy between the expected format of the status.addresses field in Gateway API v1.4.0 and the logic used by the Higress v2.2.0 controller. The Gateway API v1.4.0 specification mandates that each address in the status.addresses array must include both a type and a value. When the value is a hostname, the type must be set to Hostname. The Higress controller, in this version, appears to be missing the logic to correctly set the type field when retrieving the address from the associated LoadBalancer Service's status.loadBalancer.ingress.

The community discussion suggests that the controller might be using an older API logic that only wrote the value, which was sufficient in previous versions of the Gateway API but is no longer compliant with the stricter validation introduced in v1.4.0.

Solution

The primary solution involves ensuring that the Higress controller correctly sets the type field to Hostname when updating the Gateway status with an ELB hostname. This requires modifications to the Higress controller's code to comply with the Gateway API v1.4.0 specification.

Until an official fix is released in a newer version of Higress, you can consider the following workaround:

Workaround (Potentially Risky, Use with Caution):

While not recommended for production environments due to potential compatibility issues, you could theoretically downgrade the Gateway API CRDs to a version prior to 1.4.0. However, this approach might introduce other problems and is not a long-term solution. It's crucial to thoroughly test this in a non-production environment before considering it.

Recommended Approach:

The best course of action is to monitor the Higress project for updates and apply the official fix when it becomes available. Keep an eye on the Higress GitHub repository for new releases and patch notes.

Practical Tips and Considerations

By understanding the root cause and implementing the appropriate solution, you can ensure that your Higress Gateway resources are correctly managed and their status accurately reflected in your Kubernetes cluster.