Hi,
I'm Tom.

I'm a software engineer. I live in Shanghai.

Blog  ·  Github  ·  Linkedin
Recent (Full archive →)
  1. Attack Vectors in OAuth 2.0
  2. Exception Handling: Go vs. Java
  3. Microservice health check in Kubernetes
  4. Build a simple protocol over TCP
  5. Cassandra: A Journey of Upgrade

Site designed by @orourkedesign.

Attack Vectors in OAuth 2.0

Yesterday I was checking out which OAuth grant type is best fit for mobile applications. It turns out PKCE(Proof Key of Code Exchange) has become the de facto standard.

The issue with implicit grant

The flow for implicit grant is fairly simple.

In step 1, user agent, ie, mobile application, will normally open the browser to redirect resource owner to authorization server. The security implication here is that, using a web-view inside the application may expose resource owner credentials to application.

The attack vectors lie in step 3 after resource owner authorizes the access, the browser will redirect resource owner back to the application. For most mobile platforms, this is done by a special URL pattern, such as “com.foo.bar://callback-url”, applications bind handlers to this URL. So there’s a chance a malicious application also binds a handler to this URL, and gets the token in the URL fragment.

Don’t ignore state parameter

Another attack vector is that, a malicious user places his own access token in the URL fragment. Now the user’s session is replaced with the attacker’s session. The resource the user creates will actually show up in attacker’s account.

Therefore it is crucial for the client application to verify the state parameter after the redirect, to ensure that the session is the one resource owner initiates. The client may generate and store the state in cookie or session before the redirect to authorization server.

This is also the case for authorization code grant without client_secret (which is a trend to replace implicit grant). The attacker may authorize the authorization server with his own credentials, normally the redirect will happen. However the attacker traps the redirect, copy the URL, and send it to the target user, or initialize. A victim may already login, and clicks on the link, now he gets the attacker’s token, and session is replaced.

PKCE to the rescue

The implicit grant and the authorization code grant are both vulnerable if resource owner is redirected to a malicious client. But can we enforce some checks in the authorization server make sure the request comes from a valid client?

PKCE solves this by introducing proof key for code exchange. It is an enhancement on authorization code grant thus the workflow is about the same, except the client needs to

So the idea here is that client sends the challenge in step 1 and prove it has the key to generate the challenge when exchanging for token in step 3. Such that a malicious client is unable to obtain a code without the key, as it is only held by the righteous client.

 

Exception Handling: Go vs. Java

After reading this post Why Go Gets Exceptions Right, I have some thoughts and would like to write down here.

The first thing bumps to my head is why we’re using exceptions in the first place. The answer seems clear, we want to signal the caller of our function that something went wrong.

So how do we do it? In Java we have exceptions, specially checked exceptions with throws keyword in the method signature. Checked means it will be checked by the compiler at the compile time, as a way to inform the caller that certain exceptions are expected to be thrown from the method. While in go, with the ability to return multiple values, go informs the caller by returning an error. It is part of the function signature, the contract between the function and the caller, the caller should receive the result and the error correctly, otherwise the program won’t compile.

Second thing I’m wondering is that does Go really handle exceptions better than Java? Let’s see how Go and Java handle them differently.

If we categorize the exceptions into following categories, language-ignostically(they’re all concepts from Java, but can be interchangable for discussion):

Checked exceptions

As I mentioned above, checked exceptions are enforced for both Java and Go. And for the caller, they should handle the exception properly. Either try-catch in Java, or check if err is not nil in Go. And in go, you can ignore the error by using _, similarly, in Java you can try-catch the exception and does nothing about it.

Runtime exceptions

Runtime exceptions is the exceptions you can’t always detect at the compile time. It could be a null pointer exception, or a index out of bounds exception. In Java, as runtime exception is unchecked, you don’t need to explicitly throws them in the method. In Go, you don’t need to specify an error in the return statement. And for the caller, you have no idea what went wrong, you basically do nothing with it. Or you can explicitly try-catch in Java or defer-recover in Go, otherwise, for Java the exception will bubble up the call stack until some exception handler catches it. And for Go, the panicking will also climb up the stack of the current goroutine until some recover happens. If none of this happens until main method, the program crashes.

Error

Errors are unpreventable and unrecoverable exceptions, like out of memory error. In Java you don’t catch it and In Go you should just let it panic.

So viewing from this angle, the way Go and Java handles exceptions are almost the same. That’s why I don’t think Go is better than Java in this regard. On the contrary, there are things I don’t like about error handling in Go:

 

Microservice health check in Kubernetes

TL;DR

Service should provide a standard endpoint for the purpose of health check and monitoring. The specification for the endpoint should conform to the requirements as elaborated in section Requirements.

Background

what is health check

A health check detects the healthy status of a service, reporting whether the service is able to handle requests or whether the service is in a bad state and should be restarted.

Why health check is needed

High availability

There are many cases when a service is started/restarted

Under these circumstances, if a request is forwared to a service that is still in the middle of its starting/restarting process, it would probably fail. So we need to make sure a service is healthy to accept requests before adding it to the load balancer(kubernetes service), such that we could reduce the service down time and achieve high availability.

Service stability

Service running for a long period of time may fall into a bad state, in which service is unable to handle requests properly. In this case, service needs to be prohibited from receiving requests, until it is recovered either via restart or manual resurrection. Thus our service in all is stable.

Monitoring

A big part of the DevOps responsibilities is to monitor and maintain the health of running services. If a service goes down, appropriate actions should be undertaken to bring the service back to life. Health check informs the DevOps whether the service is malfunctioning.

Clients of health checks

Downsides of health check

As health check is done periodically, not in a real time manner, there still could be time gap before the unhealthy state is known to the clients. To mitigate the effect of this situation, a reasonable checking period should be set.

Requirements

What should be checked

As the definition of healthy may vary from service to service, depending on the service application logics, there could be many levels of healthy:

Each service may define its own criteria, however the result of these checks should be certain, ie, the service is either healthy or not healthy, there should be no middle state.

How to expose health check to clients

How health check respond to clients

Status code

Response body

Response body can be empty, however attaching additional information of what is checked and the result of the check is preferred

Security/Access control

The health check should be private and limited to internal access, however if it is open to public access:

Implementation

Examples

Service OK

1
2
3
4
5
6
7
$ curl -XGET http://127.0.0.1:9000/health
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
 
{
    "status": "UP"
}

Service Unavailable

1
2
3
4
5
6
7
$ curl -XGET http://127.0.0.1:9000/health
HTTP/1.1 503 Service Unavailable
Content-Type: application/json; charset=utf-8
 
{
    "status": "Down"
}

Authenticated access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ curl -XGET http://127.0.0.1:9000/health -H 'Authorization: Basic ZnNfbm9ybWFsOkBDZ0JkSjZOKz9TbmQhRytIJEI3'
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
 
{ 
  "status":"UP",
  "fooService":{ 
    "status":"UP",
    "description":"Foo service"
  },
  "mysql":{ 
    "status":"UP",
    "description":"MySQL Database",
    "hello":1
  }
}

Libraries

Java

Spring Boot Actuator

Go

N/A

Client Integration

Kubernetes integration

Please refer to https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.

Readiness Probe

1
2
3
4
5
6
7
8
9
readinessProbe: # check if service in a healthy state, will remove pod from service/loadbalancer if probe failed
    httpGet:
        path: /health
        port: 9000
    initialDelaySeconds: 10 # start checking after 10s after pod starts. should set to a minimal value such that service able to receive requests as soon as it is ready
    periodSeconds: 10 # check health check api every 10 seconds
    timeoutSeconds: 3 # if response time is logger than 3 seconds, we consider the check as failed
    failureThreshold: 3  # if check fails for 3 times in a row, we consider the pod is in a bad state, pod will be restarted
    successThreshold: 1 # if check succeeds for once, we consider the pod is back to normal

Liveness Probe

1
2
3
4
5
6
7
8
9
livenessProbe: # check if pod is in a bad state, will restart pod if probe failed
    httpGet:
        path: /health
        port: 9000
    initialDelaySeconds: 180 # start checking after 180s after pod starts, should be logger than service start time. Some service takes minutes to start, so we set a big value here.
    periodSeconds: 10 # check health check api every 10 seconds
    timeoutSeconds: 3 # if response time is logger than 3 seconds, we consider the check as failed
    failureThreshold: 3 # if check fails for 3 times in a row, we consider the pod is in a bad state, pod will be restarted
    successThreshold: 1 # if check succeeds for once, we consider the pod is back to normal

Prometheus integration

Prometheus keeps polling health API constantly and store the result in its time series database. If health check metrics match a predefined alert rule, a alert will be triggered.

Scrape config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
job_name: 'health-check'
  metrics_path: /probe
  params:
    module: [http_2xx]  # Look for a HTTP 200 response.
  kubernetes_sd_configs:
  - role: service
 
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_healthcheck]
      regex: true
      action: keep
    - source_labels: [__meta_kubernetes_service_name]
      target_label: service
    - source_labels: [__address__]
      regex: (.*)(:80)?
      target_label: __param_target
      replacement: ${1}/health
    - source_labels: [__param_target]
      regex: (.*)
      target_label: instance
      replacement: ${1}
    - source_labels: []
      regex: .*
      target_label: __address__
      replacement: blackbox-exporter-service:9115  # Blackbox exporter.

Service annotation

Add prometheus.io/healthcheck annotation to Kubernetes service so that they could be discovered by the health check job.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/healthcheck: "true"
  name: foo-service
  namespace: foo
  labels:
    app: foo-service
spec:
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
  selector:
    app: foo

Blackbox exporter config

Config a http_2xx module to scrape health api

1
2
3
4
5
6
7
8
9
10
11
12
13
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: []  # Defaults to 2xx
      method: GET
      headers: {}
      no_follow_redirects: false
      fail_if_ssl: false
      fail_if_not_ssl: false
      fail_if_matches_regexp: []
      fail_if_not_matches_regexp: []
 
View the full archive →