Created
September 29, 2015 14:43
-
-
Save jonludlam/a9ebf00a024da7b89f08 to your computer and use it in GitHub Desktop.
XenVM testing plan
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The purpose of this part of the CAR is to improve the reliability and | |
robustness of the XenVM component of the Thin LVHD feature. This will | |
be achieved by three main activities: Expanding the existing | |
dev-tests, implementing some new features and applying some formal | |
methods to prove models of how bits xenvmd works are correct. | |
Expanding dev tests: | |
There are already many dev tests being run on every single build and | |
pull request going into xenvm. Currently, these mainly cover the | |
functionality of xenvmd and xenvm, and only partly cover the | |
activities of the local allocator. In order to increase the coverage | |
we propose to do the following: | |
- Extend the mock device-mapper component such that it can be used | |
between processes. | |
Using the real device mapper is limiting, in that udev becomes | |
involved and is a source of delays. Additionally, using the one | |
system device-mapper means we can't do multi-host testing: By | |
extending the mock to work between processes, we can simulate a | |
pool of as many hosts as we like using only 1 real host. This is a | |
small amount of work to change the mock to use 'read-modify-write' | |
with filesystem locks on each call rather than keeping state in | |
memory as it currently does. | |
- Functorize high-level logic over the lower-level modules. | |
This is a neat trick that we already use today, but can be | |
extended. The idea is to make mock modules that simulate parts of | |
the code. As a concrete example, we can functorize over the | |
'shared-block-ring' code in order to use a more convenient on-disk | |
layout of the messages, so that each message sent over the ring | |
becomes a file on disk that is easily examined. This would be | |
particularly useful in testing invariants over the set of all | |
messages sent over the ring, as in the 'real' shared-block-ring the | |
messages get overwritten. Another example is functorizing over the | |
'Time' module, so that the current 5 second poll interval can be | |
changed to be much shorter or longer or randomized in order to | |
cause the tests to run more quickly or to explore differences in | |
thread interleaving. | |
- Implement new component tests: | |
Once the above two are completed, the new functionality will be used | |
to enable the following tests: | |
- Stress test | |
This would be a multi-host, many LV test that would quickly | |
simulate thousands of VDIs across 16 hosts, with lots of | |
allocations. | |
- Restart tests | |
This would test restarts of xenvmd and the local allocator. Both | |
are built to be 'Crash-only software' (not as bad as it sounds, | |
honest!) https://en.m.wikipedia.org/wiki/Crash-only_software. The | |
idea is to put fist points in to cause an exit at particularly | |
important points and to verify that operations either succeed | |
or fail, and not leave the system in an intermediate state. | |
- Invariant post-processing | |
Having written out the journals and rings into easily readable | |
files, they can be post-processed to ensure that the invariants | |
required are indeed being held. These are statements such as | |
'For all FreeAllocation messages sent from xenvmd to all | |
local allocators, the blocks allocated must be unique unless | |
the messages have the same generation count' | |
Targetted Formal Methods | |
We already have a Promela model of the shared-block-ring | |
suspend/resume protocol, and a model of a previous (broken) version of | |
the xenvmd -> local allocator messages. We would like to spend some | |
time examining some of the more critical aspects of system to try to | |
find any other lurking issues. This is to be done alongside the | |
functorization work outlined above in order to simplify the logic | |
in xenvmd/local allocator such that it is more obviously peforming | |
the same logic as the models are testing. | |
As a group and team, we have limited exposure to these methods so it's | |
hard to predict how long this will take and what the benefit will be. | |
However, I believe knowledge of these methods will be very beneficial | |
not only to Thin LVHD but to XenServer Engineering as a whole. I | |
suggest our approach should be to time-box the CP ticket to do this | |
aspect of the work. | |
New feature implementation | |
- Watchdog. Xapi has a watchdog that makes sure that it's running, | |
and restarts it if it crashes. Since both xenvmd and the local | |
allocator were designed to be crash-only, a watchdog is almost | |
trivial to do. | |
- PV resize. I'm unsure whether this should go in this CAR or whether | |
it should be just done under the CA ticket. We need to implement PV | |
resize in the underlying LVM library - mirage-block-volume |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Looks good!
In the New feature implementation section, can we make xenvmd and the local-allocator a service or is this too much work because we currently have one per SR?