Please note that since the release of the next-generation Open Targets Platform, the REST API is no longer available.
Head to the new Platform documentation for details on how you can access our data.
Even though each Open Targets release is made globally available in 3 regions, we decided early on to mask this complexity from the user and publish our API from a single URL at https://api.opentargets.io
We achieve this by load balancing our traffic from https://api.opentargets.io to three reverse proxies in three different regions. Here is a tutorial on how to recreate the same setup, using kubernetes on Google Cloud Platform.
Publishing our platform to a global audience, we have always strived to maintain a fast and responsive user experience. How fast our webapp is, directly depends on how fast our API is. In particular, how fast it can responds to each query from the webapp. This is measured by latency, which is the average time that passes from the moment our webapp requests the data to the time it receives it). Latencies are a product of i) how long it takes to compute the data and ii) how far it has to travel. The farther a signal has to travel, the longer it takes and the larger the latency.
After optimizing how long it takes to compute the data, the most effective way to achieve low latencies is therefore to have a large
number of servers, distributed around the world that serve our data. However, lots of
servers mean lots of work in keeping them up, synchronized and easy to maintain, which can be quite hard for a team as small as ours.
While we found other great ready-made solutions to deploy our webapp,
we had to build our solution for our API and backend. In an effort to keep operational work to a minimum we choose to use performant cloud solutions, specifically Google Cloud compute and Google App Engine.
We will address how we automatically deploy our data and API in three regions, using continuos integration in another blog post.
Here we are going to describe how we tie it all together and mask this complexity from the user, while exposing a single https://api.opentargets.io URL.
We wanted to load balance and allocate the traffic from https://api.opentargets.io to three reverse proxies in three different regions (one in the United States, one in Europe and one in Asia), which in turn directed the traffic to our backends replicas - each running in a separate regional AppEngine project.
The choice of proxy was obvious to us: we have always been a fan of nginx and used it as a reverse proxy in a variety of scenario.
We choose to orchestrate the nginx instances using a kubernetes federated cluster, which will allow us to manipulate the configuration of nginx on a geographically distributed way, and with minimal ops work.
For production, you also need to install kubefed.
Warning! you need to download the latest version yourself. It
doesn't come with the
google-cloud-sdk, but you will think it does and thus assume its version is in sync with the generally available Kubernetes version.
Don't make your life miserable. Always install the latest version directly.
1. Make a DNS zone
I created the apiproxy.opentargets.io zone on google cloud DNS
2. Deploy a GKE cluster
Make the clusters, one node each. The n1-standard-1 machine is the default. They all need to have google CloudDNS roles attached to the instance, so you must create them from the command line.
gcloud --project open-targets-prod container clusters create apiproxy-eu \ --zone=europe-west1-d --quiet \ --num-nodes=1 \ --scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite \ > /dev/null 2>&1 & gcloud --project open-targets-prod container clusters create apiproxy-useast \ --zone=us-east1-d --quiet \ --num-nodes=1 \ --scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite gcloud --project open-targets-prod container clusters create apiproxy-jp \ --zone=asia-northeast1-b --quiet \ --num-nodes=1 \ --scopes cloud-platform,storage-ro,logging-write,monitoring-write,service-control,service-management,https://www.googleapis.com/auth/ndev.clouddns.readwrite
You have to force use of credentials certificate, otherwise it will use the temporary user token of your kubectl contexts and render the federation unoperable after a few minutes.
gcloud --project open-targets-prod config set container/use_client_certificate True export CLOUDSDK_CONTAINER_USE_CLIENT_CERTIFICATE=True
Then you can type the suggested commands:
gcloud container clusters get-credentials apiproxy-useast --zone us-east1-d --project open-targets-prod gcloud container clusters get-credentials apiproxy-jp --zone asia-northeast1-b --project open-targets-prod gcloud container clusters get-credentials apiproxy-eu --zone europe-west1-d --project open-targets-prod
3. Federate the clusters
Initiate the cluster federation by launching:
kubefed init apiproxyfed --host-cluster-context=gke_open-targets-prod_europe-west1-d_apiproxy-eu \ --dns-provider="google-clouddns" \ --dns-zone-name="apiproxy.opentargets.io.
you also have to create a namespace, since kubefed does not do this for you.
kubectl create ns default ##<= wrong, check the right syntax
From here onwards we assume that kubectl uses the
apiproxyfed if not otherwise specified.
Rename clusters context to be DNS compatible (no long words with underscores)
kubectl config rename-context gke_open-targets-prod_europe-west1-d_apiproxy-eu apiproxy-eu kubectl config rename-context gke_open-targets-prod_asia-northeast1-b_apiproxy-jp apiproxy-jp kubectl config rename-context gke_open-targets-prod_us-east1-d_apiproxy-useast apiproxy-useast
join the renamed clusters to the federation plane
kubefed join us-apiproxy --cluster-context="apiproxy-useast" --host-cluster-context="apiproxy-eu" kubefed join eu-apiproxy --cluster-context="apiproxy-eu" --host-cluster-context="apiproxy-eu" kubefed join jp-apiproxy --cluster-context="apiproxy-jp" --host-cluster-context="apiproxy-eu
To test that it's all working, you can try spin 3 nginx replicas across the 3 regions. Notice that deployment is currently under
kubectl create deployment nginx --image=nginx && kubectl scale deployment nginx --replicas=3
You should see three deployments, one replicas each, in each of the clusters. Check with:
kubectl --context=apiproxy-eu get svc,po,deploy
Should you use another namespace?
If you want to try spinning another version of the proxy (eg. if you want to spin your proxy servers with a new
nginx.conf, but keep production up), this is a good time to create another namespace and repeat all the steps below.
server.chained.crt files for your domain:
kubectl --context apiproxyfed create secret tls opentargetsio-certs --key server.key --cert server.chained.crt
4. Deploy your API proxy
We can deploy the config map and the services to the federated cluster, but the deployments will be region specific (so that each points to the right service).
kubectl config set-context apiproxy-fed
Create the default nginx configmap, and the location specific one:
kubectl create configmap nginxconf --from-file=nginx.conf kubectl create configmap backends --from-file=eu=backend.eu.conf --from-file=us=backend.us.conf --from-file=jp=backend.jp.conf
Take a look at it:
kubectl describe configmap backends
We are spinning a nodeport and a loadbalancer, because a loadbalancer would give us three separate IPs. First we reserve the IP:
# create the IP gcloud --project open-targets-prod compute addresses create api-ingress --global
Then spin the nginx deployment,service and ingress:
kubectl create -f apiproxy_deploy.yaml kubectl create -f apiproxy_svc.yaml kubectl create -f apiproxy_ing.yaml
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: apiproxy-eu spec: replicas: 1 template: metadata: labels: name: apiproxy-eu app: apiproxy region: eu spec: volumes: - name: nginxconf-volume configMap: name: nginxconf - name: backends-volume configMap: name: backends items: - key: proxy path: proxy.conf - key: eu path: backend.conf containers: - image: nginx:1.13-alpine name: nginx livenessProbe: httpGet: path: /_heartbeat port: 80 initialDelaySeconds: 30 timeoutSeconds: 1 readinessProbe: httpGet: path: /_ah/health port: 80 httpHeaders: - name: Host value: api.opentargets.io initialDelaySeconds: 30 timeoutSeconds: 10 volumeMounts: - mountPath: /etc/nginx/nginx.conf name: nginxconf-volume subPath: nginx.conf - mountPath: /etc/nginx/conf.d/ name: backends-volume ports: - name: http containerPort: 80 protocol: TCP
apiVersion: v1 kind: Service metadata: name: apiproxysvc-node labels: name: apiproxysvc-node spec: type: NodePort ports: - name: http port: 80 targetPort: 80 protocol: TCP nodePort: 30036 selector: app: apiproxy
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: apiproxying-tls annotations: # kubernetes.io/tls-acme: "true" kubernetes.io/ingress.class: "gce" kubernetes.io/ingress.global-static-ip-name: “api-ingress” spec: backend: serviceName: apiproxysvc-node servicePort: 80 tls: - secretName: opentargetsio-certs
NOTE this can take a long time before it's working and all the "backends" look green and healthy in the GKE dashboard. Go fix yourself a coffee.
You can monitor the progress directly from the console.
You should verify that the proxy listens (preferrably from different geographical locations):
and when you are happy with its functionality, you can also enable CDN, with a command similar to:
BACKEND=$(kubectl get ing apiproxy -o json | jq -j '.metadata.annotations."ingress.kubernetes.io/backends"' | jq -j 'keys') gcloud --project open-targets-prod compute backend-services update $BACKEND --enable-cdn
Hope this tutorial helps you testing out the kubernetes federation functionality! Let me know in the comments if you have trouble.