Credit goes to for the initial article that got me in the right direction.

My particular use case was to run external-dns in a GKE cluster in order to watch for services that would want to publish to a R53 zone. I had already got all the roles and everything else set up as described in the article above, save for a few missing items:

First, we needed a custom external-dns Docker image. I needed to install curl and jq at the very least in order to parse anything coming from GCP. I ended up using the Bitnami external-dns Docker image as that had things like bash, apt, etc. in order to install these things into the image. Here's that custom Docker image:

FROM bitnami/external-dns:0.15.1

USER root

RUN apt update && \
    apt install -y \
        jq \
        curl \
    && apt clean

USER 1001

Dockerfile example for external-dns

Second, I needed to mount two files into the container before it starts up: the ~/.aws/config file and the file - thankfully, both the official external-dns chart and the Bitnami external-dns chart allows us to mount custom ConfigMaps into the container. Note that for the Bitnami external-dns Docker image, $HOME is set to /, so you'll want to put your AWS config file in /.aws/config. That will contain the reference to your file. I usually create a separate Helm chart that installs before the upstream Helm chart in order to inject these ConfigMaps in.

Also, note that you'll need to make the file executable in the container - to do that, when you define your volume for your credentials file, ensure that you have the defaultMode set to 0755.

  - name: aws-credentials-script
    mountPath: /usr/local/bin/
  - name: aws-config
    mountPath: /.aws/config
    subPath: config
  - name: aws-config
      name: aws-config
  - name: aws-credentials-script
      name: aws-credentials-script
      defaultMode: 0755

Example of what you would pass into the external-dns Helm chart

Third and most importantly, I needed to replace the way that we were authenticating to AWS. YMMV in your use case, but what I've noticed is that if we do end up using the aws CLI tool for grabbing STS web identity tokens, our pod memory usage spikes and eventually gets OOMKilled, no matter how much memory we threw at the problem. The external-dns process itself is very, very lightweight, so all signs pointed to the aws CLI tool being the culprit.

Snippet of the memory usage graph for the pod. The spike is when it was not in CrashLoopBackOff due to being OOMKilled. Red line is the memory limit set for the pod.

Here's that final file that we ended up using within a ConfigMap (note: this is in a Helm chart, so you'll want to replace the .Values.* bits if you're using something like Kustomize or whatever else):

apiVersion: v1
kind: ConfigMap
  name: aws-credentials-script
    {{- include "external-dns-bootstrap.labels" . | nindent 4 }}
data: |
    AUDIENCE="{{ .Values.oidc_audience }}"
    ROLE_ARN="arn:aws:iam::{{ .Values.aws_account_id }}:role/{{ .Values.aws_iam_role }}"

    jwt_token=$(curl -sH "Metadata-Flavor: Google" "http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=${AUDIENCE}&format=full&licenses=FALSE")
    jwt_decoded=$(jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "$jwt_token")

    jwt_sub=$(echo -n "$jwt_decoded" | jq -r '.sub')

    credentials=$(curl -sH "Accept: application/json" "${ROLE_ARN}&RoleSessionName=${jwt_sub}&WebIdentityToken=${jwt_token}&Version=2011-06-15" | jq '.AssumeRoleWithWebIdentityResponse.AssumeRoleWithWebIdentityResult.Credentials' | jq '.Version=1')

    human_readable_date=$(echo $credentials | jq '.Expiration' | jq 'todate')

    credentials=$(echo $credentials | jq ".Expiration=${human_readable_date}")

    echo $credentials

Final file

Memory usage went from ~1.8Gi and rising, all the way down to 20Mi.

Hopefully this helps some wayward ops person out there somewhere. 😄

