Skip to content

Instantly share code, notes, and snippets.

@plant99
Created June 20, 2022 13:03
Show Gist options
  • Save plant99/6815181e93605f387e02eb089c9e3ec1 to your computer and use it in GitHub Desktop.
Save plant99/6815181e93605f387e02eb089c9e3ec1 to your computer and use it in GitHub Desktop.
load-test-additional-docs

Load test notes

Prerequisites for load-testing a new feature

  • Build and test a feature in a mattermost-server branch.
  • Scan through markdown docs in mattermost-load-test-ng. Make sure you get an idea of what a coordinator, agent, controller, bounded and unbounded load-test is, and why do we need metrics collection(a deployment of prometheus) in this setup.

Steps to load test the feature

Brief summary of tasks

This would include

  • Writing new load testing actions to mattermost-load-test-ng
  • Testing the changes locally.
  • Testing the changes in terraform: its purpose is to load-test under a heavier load/dataset.
  • Analyse load-test results
  • Getting the changes merged to load-test repository, so the release-manager can test the same changes for unexpected behaviors during the upcoming releases.

Detailed summary of tasks

Writing new load testing code

  • Go through coverage.md, make a list of changes needed to load-test the new feature.
  • Optionally check out this video walkthrough by @claudio.costa
  • Make the necessary changes in loadtest/

Testing changes locally

  • Go through local_loadtest.md
  • Some additional information on the above document
    • In step where configs are copied from samples, simplecontroller.json doesn't have to be generated since by default the config.json uses a simulative controller.
    • updates to config.json
      • Make sure to change ConnectionConfiguration section in config.json according to the local deployment of mattermost-server.
      • InstanceConfiguration refers to doc, this is used by init command later to initially populate mattermost database.
    • increase frequency of the new action, so its easier to debug while running locally. sample
    • Expect some failures in running-a-basic-loadtest, if one sees > 10-15 error logs, there's a troubleshooting guide.
    • check ltagent.log in mattermost-load-test-ng directory, and server logs for details on errors, if any.
    • Further sections of the document i.e using load-test agent API server and using load-test coordinator are highly recommended to go through, since the Terraform deployment uses the latter method to execute the load-tests and it'd be easier to debug the setup if the developer understands these underlying principles.

Testing changes in terraform

Even without the framework, a general load-test workflow in the cloud will be similar to the following.

  • Create a database (to be used by mattermost servers)
  • Create deployments of mattermost servers - let's call them app servers for convenience.
  • Create 'agent' machines to ping app-servers using controllers
  • Populate database, and start app, agents (i.e loadtest)
  • Collect metrics from app, and agent deployments to either,
    • manage an unbounded load test
    • analyse api performance after the load-test completes.

Loadtest instances created with this framework achieve the same goals as mentioned above, only some of the things like creating a deployment, running a loadtest, etc. are automated.

Following are some steps to load-test a new feature in production, after testing new actions locally.

  • Go through terraform_loadtest.md
  • Some additional information on the above document
    • AWS credentials are to be fetched from onelogin and added to terraform config.

    • Enterprise license is to be fetched from Developers:private

    • When performing this step, edit MattermostLicenseFile value to the path containing the license.

      • The fields MattermostDownloadURL and LoadTestDownloadURL point to the latest mattermost-server, and load-test package respectively to be used in the load-test.
      • That's the default option, when there's unmerged changes to mattermost-server, or mattermost-load-test-ng
        • make build-linux in mattermost-server directory, change MattermostDownloadURL value to the path containing mattermost executable. For example file:///somepath/mattermost-server/bin/linux_amd64/mattermost
        • make package in mattermost-load-test-ng directory, change LoadTestDownloadURL value to the path containing gzip of load-test package. For example file:///somepath/mattermost-load-test-ng/dist/v1.5.0-8-gd4f18cf/mattermost-load-test-ng-v1.5.0-8-gd4f18cf-linux-amd64.tar.gz
    • Edit SSHPublicKey in deployer.json after setting up ssh.

    • go run ./cmd/ltctl deployment create

      • Limit operations of deployment to a single shell window.
      • If the deployment gets stuck, check for ps -ef | grep terraform, if there are running processes, restart the computer and start again.
      • Terraform actions are idempotent, so one would rarely have to destroy the deployment, if things go wrong while creating resources.
    • Once the deployment gets created successfully, stdout will have information on server addresses for app, agent, coordinator, and Grafana deployments.

      • Open the mattermost URL in browser to check if the app is working as required.
      • At this point, one might check the server logs by ssh-ing into the app instance. Once in there, open /opt/mattermost/logs/mattermost.log.
    • Gearing up to start the load-test

      • Use agents' URL and Prometheus URL in the coordinator.json file generated here
      • Change ConnectionConfiguration in the config.json generated here
      • Configure InstanceConfiguration in the same config.json (which as mentioned earlier, creates the seeds mm-server's database with required data for the loadtest). Note that a heavier config with NumPosts would take a very long time get seeded. Please refer to the NB section below to manually seed the database from a backup, in order to bypass 'data-generation.
    • Start the load test with go run ./cmd/ltctl loadtest start

      • Once a loadtest is running, its status can be checked with go run ./cmd/ltctl loadtest status.

      • ssh into one of the agent machines, and cat ~/mattermost-load-test-ng/logs/ltagent.log to verify the load-test is working without errors.

      • Open the Grafana deployment with the URL from go run ./cmd/ltctl deployment info.

      • It takes some time for the deployment to stabilize, i.e while the loadtest tool connects MaxActiveUsers count of users to app, there might be a big count of HTTP 4xx errors in this duration.

        A sample error count vs time graph: error-vs-time

    • "My load test is running, now what?"

      • If it's a bounded loadtest, it has to be manually stopped with go run ./cmd/ltctl loadtest stop after an hour
      • If it's an unbounded loadtest, the load-test will finish with a stdout listing the maximum concurrent users the deployment supports. The load-test status check command will say status as Done when it's complete.
    • "My load tests ran successfully, what to make of it?"

      • In case of unbounded load-tests, when they finish, go run ./cmd/ltctl loadtest status would give you a count of maximum concurrent users which is a metric to compare the performance of that version of mattermost-server.
      • In case of two bounded loadtests with same MaxConcurrentUsers count, one can generate a report comparing performance of various server metrics.
      • In case of both bounded and unbounded loadtests, one can create grafana dashboards to analyze performances of the new features by filtering api metrics relevant to the new handler. Here's an example. sample-dashboard-creation

NB:

  • For seeding the database manually :
    • InstanceConfiguration section would be as minimal as possible to reduce db init time.
    • Message in Developers:Performance for a migration file.
    • ssh into the app machine, and psql into the connected database.
    • Drop all tables, log out of psql. Run the migration, which might take a while.
    • Now, the app service needs to be restarted so the server can run the necessary migrations.
    • ssh into app-instance(s) and run sudo systemctl restart mattermost && until $(curl -sSf http://localhost:8065 --output /dev/null); do sleep 1; done;
  • If the feature is behind a feature flag: link to Claudio's message to add environment variables to app-service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment