Load Testing Approach.

The theory behind this testing approach is based on Little's Law and USL (Universal Scalability Law) which are explained in more detail in Coda's blog post

One thing left out from the above article is how to go about load testing. In this document I'll go over how to load test variations on infrastructure configurations

You'll need a few key tools:

ghz (like gigahertz... get it?) a load testing tool used for gRPC
usl go get -u github.com/codahale/usl to install the usl CLI

The rest is relatively straight forward given you have successfully deployed your gRPC application onto your platform.

Things to consider:

Be aware of the resource demands of the application you're testing. gRPC ping and helloworld applications don't take a lot of processing power. If you scale on processing power your limiting factor will be connections, not CPU.
If your application calls out to another service you will also be load testing that service.
There will be an upper bound on how many requests a single ghz can make at a time. (I honestly am unclear what this would be)

Once the application is ready and your auto-scaling infrastructure set in place you'll use ghz

The flags you'll want are the following:

--connections number of connections to open at once
-c concurrency
-z is the duration of the load test
--proto path to the .proto file related to your service.
--insecure for now since we don't have a story around cert-management
--call the name for the gRPC service proto you'll be load testing

When load testing --connections and -c should be the same as they will be our N value

ghz -c 3 --connections 3 -z 2m --proto ./ping.proto --insecure --call ping.PingService.Ping --authority grpc-ping.default.example.com 35.247.54.84:80

Setting up a GCE instance to run ghz

Create an instance gcloud compute ssh instance-1 --zone <zone> You can find a list of zones here: https://cloud.google.com/compute/docs/regions-zones/ i.e. gcloud compute ssh instance-1 --zone us-west1-b
ssh into the instance
Install git
- apt-get update && apt-get install git
Install go
- download the appropriate archive from https://golang.org/dl/
  - for instance as of this document the latest go stable version was https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz
  - wget "https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz"
- tar -C /usr/local -xzf <go-tar-file>
- Add go to that path by adding export PATH=$PATH:/usr/local/go/bin to the $HOME/.profile
then install ghz Similar to installing go
- download the appropriate archive from the releases https://github.com/bojand/ghz/releases
  - i.e. wget "https://github.com/bojand/ghz/releases/download/v0.37.0/ghz_0.37.0_Linux_x86_64.tar.gz"
- untar to the same location as go
- i.e. tar -C /usr/local ghz_0.37.0_Linux_x86_64.tar.gz
Install the USL tool to calculate max through put go get -u github.com/codahale/usl
- More info about the usl command could be https://godoc.org/github.com/codahale/usl/cmd/usl

To start using ghz you'll need to get the protos you need on your machine. It may be simplest to copy the .proto file contents into a new file on the instance.

Once you have all of this you can now start to hit your service with ghz

You'll need to grab the public IP and port of the service you need as well. This will vary on instillation If you're using a service mesh you'll also need to grab the domain of the service you plan on running and place it in the --authority flag

For instance when deploying the knative grpc example here I did the following:

ghz -c N --connections N -z 2m --proto ./ping.proto --insecure --call ping.PingService.Ping --authority grpc-ping.default.example.com 35.247.54.84:80

The output is verbose but what you want is the request per second. You may also want to note down average latency if you're putting this in a spreadsheet of various configurations which I recommend.

Summary:
  Count:        200
  Total:        181.57 ms
  Slowest:      69.60 ms
  Fastest:      26.09 ms
  Average:      32.01 ms
  Requests/sec: 1101.53

Response time histogram:
  26.093 [1]    |∎
  30.444 [52]   |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  34.794 [78]   |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  39.145 [40]   |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  43.495 [1]    |∎
  47.846 [0]    |
  52.196 [2]    |∎
  56.547 [5]    |∎∎∎
  60.897 [3]    |∎∎
  65.248 [2]    |∎
  69.598 [2]    |∎

Latency distribution:
  10% in 28.48 ms
  25% in 30.08 ms
  50% in 33.23 ms
  75% in 35.43 ms
  90% in 38.89 ms
  95% in 55.45 ms
  99% in 69.60 ms

Status code distribution:
  [Unavailable]        3 responses
  [PermissionDenied]   3 responses
  [OK]                 186 responses
  [Internal]           8 responses
Error distribution:
  [8]   rpc error: code = Internal desc = Internal error.
  [3]   rpc error: code = PermissionDenied desc = Permission denied.
  [3]   rpc error: code = Unavailable desc = Service unavialable.

You'll want to keep a .csv file with no header that has the format:

concurrency (N), Request Per Second

1,955.16
2,1878.91
3,2688.01
4,3548.68
5,4315.54
6,5130.43
7,5931.37
8,6531.08

Then feed the data.csv to usl -in data.csv this will out put something like:

Model:
		α:    0.008550 (constrained by contention effects)
		β:    0.000030
		peak: X=181, Y=217458.30

And X will be the concurrent uses at peak through put and Y will be req/sec at peak throughput

You'll want to record these and if ther ewas any notes about being constrained by contention effects.

kwyn/load-testing.md

Load Testing Approach.

Setting up a GCE instance to run ghz