The theory behind this testing approach is based on Little's Law and USL (Universal Scalability Law) which are explained in more detail in Coda's blog post
One thing left out from the above article is how to go about load testing. In this document I'll go over how to load test variations on infrastructure configurations
You'll need a few key tools:
- ghz (like gigahertz... get it?) a load testing tool used for gRPC
- usl
go get -u github.com/codahale/usl
to install the usl CLI
The rest is relatively straight forward given you have successfully deployed your gRPC application onto your platform.
Things to consider:
- Be aware of the resource demands of the application you're testing. gRPC ping and helloworld applications don't take a lot of processing power. If you scale on processing power your limiting factor will be connections, not CPU.
- If your application calls out to another service you will also be load testing that service.
- There will be an upper bound on how many requests a single
ghz
can make at a time. (I honestly am unclear what this would be)
Once the application is ready and your auto-scaling infrastructure set in place you'll use ghz
The flags you'll want are the following:
--connections
number of connections to open at once-c
concurrency-z
is the duration of the load test--proto
path to the.proto
file related to your service.--insecure
for now since we don't have a story around cert-management--call
the name for the gRPC service proto you'll be load testing
When load testing --connections
and -c
should be the same as they will be our N value
ghz -c 3 --connections 3 -z 2m --proto ./ping.proto --insecure --call ping.PingService.Ping --authority grpc-ping.default.example.com 35.247.54.84:80
- Create an instance
gcloud compute ssh instance-1 --zone <zone>
You can find a list of zones here:https://cloud.google.com/compute/docs/regions-zones/
i.e.gcloud compute ssh instance-1 --zone us-west1-b
- ssh into the instance
- Install git
apt-get update && apt-get install git
- Install go
- download the appropriate archive from https://golang.org/dl/
- for instance as of this document the latest go stable version was https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz
wget "https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz"
tar -C /usr/local -xzf <go-tar-file>
- Add go to that path by adding
export PATH=$PATH:/usr/local/go/bin
to the$HOME/.profile
- download the appropriate archive from https://golang.org/dl/
- then install ghz
Similar to installing go
- download the appropriate archive from the releases https://github.com/bojand/ghz/releases
- i.e.
wget "https://github.com/bojand/ghz/releases/download/v0.37.0/ghz_0.37.0_Linux_x86_64.tar.gz"
- i.e.
- untar to the same location as go
- i.e.
tar -C /usr/local ghz_0.37.0_Linux_x86_64.tar.gz
- download the appropriate archive from the releases https://github.com/bojand/ghz/releases
- Install the USL tool to calculate max through put
go get -u github.com/codahale/usl
- More info about the usl command could be https://godoc.org/github.com/codahale/usl/cmd/usl
To start using ghz you'll need to get the protos you need on your machine.
It may be simplest to copy the .proto
file contents into a new file on the instance.
Once you have all of this you can now start to hit your service with ghz
You'll need to grab the public IP and port of the service you need as well. This will vary on instillation
If you're using a service mesh you'll also need to grab the domain of the service you plan on running and place it in the --authority
flag
For instance when deploying the knative grpc example here I did the following:
ghz -c N --connections N -z 2m --proto ./ping.proto --insecure --call ping.PingService.Ping --authority grpc-ping.default.example.com 35.247.54.84:80
The output is verbose but what you want is the request per second. You may also want to note down average latency if you're putting this in a spreadsheet of various configurations which I recommend.
Summary:
Count: 200
Total: 181.57 ms
Slowest: 69.60 ms
Fastest: 26.09 ms
Average: 32.01 ms
Requests/sec: 1101.53
Response time histogram:
26.093 [1] |∎
30.444 [52] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
34.794 [78] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
39.145 [40] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
43.495 [1] |∎
47.846 [0] |
52.196 [2] |∎
56.547 [5] |∎∎∎
60.897 [3] |∎∎
65.248 [2] |∎
69.598 [2] |∎
Latency distribution:
10% in 28.48 ms
25% in 30.08 ms
50% in 33.23 ms
75% in 35.43 ms
90% in 38.89 ms
95% in 55.45 ms
99% in 69.60 ms
Status code distribution:
[Unavailable] 3 responses
[PermissionDenied] 3 responses
[OK] 186 responses
[Internal] 8 responses
Error distribution:
[8] rpc error: code = Internal desc = Internal error.
[3] rpc error: code = PermissionDenied desc = Permission denied.
[3] rpc error: code = Unavailable desc = Service unavialable.
You'll want to keep a .csv file with no header that has the format:
concurrency (N), Request Per Second
1,955.16
2,1878.91
3,2688.01
4,3548.68
5,4315.54
6,5130.43
7,5931.37
8,6531.08
Then feed the data.csv to usl -in data.csv
this will out put something like:
Model:
α: 0.008550 (constrained by contention effects)
β: 0.000030
peak: X=181, Y=217458.30
And X will be the concurrent uses at peak through put and Y will be req/sec at peak throughput
You'll want to record these and if ther ewas any notes about being constrained by contention effects.