Globally available NGINX proxy using Kubernetes federation
Please note that since the release of the next-generation Open Targets Platform, the REST API is no longer available.
Head to the new Platform documentation for details on how you can access our data.
Even though each Open Targets release is made globally available in 3 regions, we decided early on to mask this complexity from the user and publish our API from a single URL at https://api.opentargets.io
We achieve this by load balancing our traffic from https://api.opentargets.io to three reverse proxies in three different regions. Here is a tutorial on how to recreate the same setup, using kubernetes on Google Cloud Platform.
Motivation
Publishing our platform to a global audience, we have always strived to maintain a fast and responsive user experience. How fast our webapp is, directly depends on how fast our API is. In particular, how fast it can responds to each query from the webapp. This is measured by latency, which is the average time that passes from the moment our webapp requests the data to the time it receives it). Latencies are a product of i) how long it takes to compute the data and ii) how far it has to travel. The farther a signal has to travel, the longer it takes and the larger the latency.
After optimizing how long it takes to compute the data, the most effective way to achieve low latencies is therefore to have a large
number of servers, distributed around the world that serve our data. However, lots of
servers mean lots of work in keeping them up, synchronized and easy to maintain, which can be quite hard for a team as small as ours.
While we found other great ready-made solutions to deploy our webapp,
we had to build our solution for our API and backend. In an effort to keep operational work to a minimum we choose to use performant cloud solutions, specifically Google Cloud compute and Google App Engine.
Tutorial
We will address how we automatically deploy our data and API in three regions, using continuos integration in another blog post.
Here we are going to describe how we tie it all together and mask this complexity from the user, while exposing a single https://api.opentargets.io URL.
We wanted to load balance and allocate the traffic from https://api.opentargets.io to three reverse proxies in three different regions (one in the United States, one in Europe and one in Asia), which in turn directed the traffic to our backends replicas - each running in a separate regional AppEngine project.
The choice of proxy was obvious to us: we have always been a fan of nginx and used it as a reverse proxy in a variety of scenario.
We choose to orchestrate the nginx instances using a kubernetes federated cluster, which will allow us to manipulate the configuration of nginx on a geographically distributed way, and with minimal ops work.
Prerequisites
You need gcloud and kubectl command-line tools installed and set up to run deployment commands.
For production, you also need to install kubefed.
Warning! you need to download the latest version yourself. It
doesn't come with the google-cloud-sdk
, but you will think it does and thus assume its version is in sync with the generally available Kubernetes version.
Don't make your life miserable. Always install the latest version directly.
1. Make a DNS zone
I created the apiproxy.opentargets.io zone on google cloud DNS
2. Deploy a GKE cluster
Make the clusters, one node each. The n1-standard-1 machine is the default. They all need to have google CloudDNS roles attached to the instance, so you must create them from the command line.
gcloud --project open-targets-prod container clusters create apiproxy-eu \
--zone=europe-west1-d --quiet \
--num-nodes=1 \
--scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite \
> /dev/null 2>&1 &
gcloud --project open-targets-prod container clusters create apiproxy-useast \
--zone=us-east1-d --quiet \
--num-nodes=1 \
--scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite
gcloud --project open-targets-prod container clusters create apiproxy-jp \
--zone=asia-northeast1-b --quiet \
--num-nodes=1 \
--scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite
You have to force use of credentials certificate, otherwise it will use the temporary user token of your kubectl contexts and render the federation unoperable after a few minutes.
gcloud --project open-targets-prod config set container/use_client_certificate True
export CLOUDSDK_CONTAINER_USE_CLIENT_CERTIFICATE=True
Then you can type the suggested commands:
gcloud container clusters get-credentials apiproxy-useast --zone us-east1-d --project open-targets-prod
gcloud container clusters get-credentials apiproxy-jp --zone asia-northeast1-b --project open-targets-prod
gcloud container clusters get-credentials apiproxy-eu --zone europe-west1-d --project open-targets-prod
3. Federate the clusters
Initiate the cluster federation by launching:
kubefed init apiproxyfed --host-cluster-context=gke_open-targets-prod_europe-west1-d_apiproxy-eu \
--dns-provider="google-clouddns" \
--dns-zone-name="apiproxy.opentargets.io.
you also have to create a namespace, since kubefed does not do this for you.
kubectl create ns default ##<= wrong, check the right syntax
From here onwards we assume that kubectl uses the apiproxyfed
if not otherwise specified.
Rename clusters context to be DNS compatible (no long words with underscores)
kubectl config rename-context gke_open-targets-prod_europe-west1-d_apiproxy-eu apiproxy-eu
kubectl config rename-context gke_open-targets-prod_asia-northeast1-b_apiproxy-jp apiproxy-jp
kubectl config rename-context gke_open-targets-prod_us-east1-d_apiproxy-useast apiproxy-useast
join the renamed clusters to the federation plane
kubefed join us-apiproxy --cluster-context="apiproxy-useast" --host-cluster-context="apiproxy-eu"
kubefed join eu-apiproxy --cluster-context="apiproxy-eu" --host-cluster-context="apiproxy-eu"
kubefed join jp-apiproxy --cluster-context="apiproxy-jp" --host-cluster-context="apiproxy-eu
To test that it's all working, you can try spin 3 nginx replicas across the 3 regions. Notice that deployment is currently under extensions/v1beta1
.
kubectl create deployment nginx --image=nginx && kubectl scale deployment nginx --replicas=3
You should see three deployments, one replicas each, in each of the clusters. Check with:
kubectl --context=apiproxy-eu get svc,po,deploy
Should you use another namespace?
If you want to try spinning another version of the proxy (eg. if you want to spin your proxy servers with a new nginx.conf
, but keep production up), this is a good time to create another namespace and repeat all the steps below.
Use your server.key
and server.chained.crt
files for your domain:
kubectl --context apiproxyfed create secret tls opentargetsio-certs --key server.key --cert server.chained.crt
4. Deploy your API proxy
We can deploy the config map and the services to the federated cluster, but the deployments will be region specific (so that each points to the right service).
kubectl config set-context apiproxy-fed
Create the default nginx configmap, and the location specific one:
kubectl create configmap nginxconf --from-file=nginx.conf
kubectl create configmap backends --from-file=eu=backend.eu.conf --from-file=us=backend.us.conf --from-file=jp=backend.jp.conf
Take a look at it:
kubectl describe configmap backends
We are spinning a nodeport and a loadbalancer, because a loadbalancer would give us three separate IPs. First we reserve the IP:
# create the IP
gcloud --project open-targets-prod compute addresses create api-ingress --global
Then spin the nginx deployment,service and ingress:
kubectl create -f apiproxy_deploy.yaml
kubectl create -f apiproxy_svc.yaml
kubectl create -f apiproxy_ing.yaml
where respectively:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: apiproxy-eu
spec:
replicas: 1
template:
metadata:
labels:
name: apiproxy-eu
app: apiproxy
region: eu
spec:
volumes:
- name: nginxconf-volume
configMap:
name: nginxconf
- name: backends-volume
configMap:
name: backends
items:
- key: proxy
path: proxy.conf
- key: eu
path: backend.conf
containers:
- image: nginx:1.13-alpine
name: nginx
livenessProbe:
httpGet:
path: /_heartbeat
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /_ah/health
port: 80
httpHeaders:
- name: Host
value: api.opentargets.io
initialDelaySeconds: 30
timeoutSeconds: 10
volumeMounts:
- mountPath: /etc/nginx/nginx.conf
name: nginxconf-volume
subPath: nginx.conf
- mountPath: /etc/nginx/conf.d/
name: backends-volume
ports:
- name: http
containerPort: 80
protocol: TCP
apiVersion: v1
kind: Service
metadata:
name: apiproxysvc-node
labels:
name: apiproxysvc-node
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
nodePort: 30036
selector:
app: apiproxy
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: apiproxying-tls
annotations:
# kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.global-static-ip-name: “api-ingress”
spec:
backend:
serviceName: apiproxysvc-node
servicePort: 80
tls:
- secretName: opentargetsio-certs
NOTE this can take a long time before it's working and all the "backends" look green and healthy in the GKE dashboard. Go fix yourself a coffee.
You can monitor the progress directly from the console.
Final steps
You should verify that the proxy listens (preferrably from different geographical locations):
curl http://YOURAPI.URL/
and when you are happy with its functionality, you can also enable CDN, with a command similar to:
BACKEND=$(kubectl get ing apiproxy -o json | jq -j '.metadata.annotations."ingress.kubernetes.io/backends"' | jq -j 'keys[0]')
gcloud --project open-targets-prod compute backend-services update $BACKEND --enable-cdn
Hope this tutorial helps you testing out the kubernetes federation functionality! Let me know in the comments if you have trouble.