Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save mpereira/58c8717d97b6bf3b8b5db0dd25e2d0f2 to your computer and use it in GitHub Desktop.
Save mpereira/58c8717d97b6bf3b8b5db0dd25e2d0f2 to your computer and use it in GitHub Desktop.

RFC: The POSIX Resource Limits interface between the SDK and Mesos

The DC/OS SDK allows one to set POSIX resource limits for tasks running in particular pods.

For example:

name: "some-service"
pods:
  some-pod:
    count: 1
    rlimits:
      RLIMIT_NOFILE:
        soft: 32768
        hard: 32768
    tasks:
      server:
        goal: RUNNING
        cmd: "cmd"
        cpus: 1
        memory: 1024

The SDK parses the YAML file representation into a Java object that has the soft and hard field values as JVM Long s.

RLimitSpec.java#L31-L45

private final Long soft;

private final Long hard;

public RLimitSpec(
    @JsonProperty("name") String name,
    @JsonProperty("soft") Long soft,
    @JsonProperty("hard") Long hard) throws InvalidRLimitException
{
  this.name = name;
  this.soft = soft;
  this.hard = hard;

  validate();
}

From the following unit test, we get the impression that setting a service pod's resource limit values to -1L will result in unlimited POSIX resource limit values.

RLimitSpecTest.java#L17-L22

public void testRLimitCreationSucceedsWithUnlimitedLimits() throws RLimitSpec.InvalidRLimitException {
    RLimitSpec rlimit = new RLimitSpec("RLIMIT_AS", -1L, -1L);

    Assert.assertEquals(rlimit.getSoft().get(), Long.valueOf(-1));
    Assert.assertEquals(rlimit.getHard().get(), Long.valueOf(-1));
}

Likely following the suggestion above, some services already provide -1 as a default value for a resource limit, for example our DataStax Enterprise service:

config.json#L391-L394

"rlimit_memlock": {
  "type": "integer",
  "default": -1
},

While deploying a service that has -1 configured as a resource limit, I noticed that the actual value being sent in the Protobuf message to Mesos contained a different value: 18446744073709551615.

Excerpt from the scheduler logs where the ContainerInfo Protobuf is printed:

INFO  2019-06-04 17:39:45,595 [pool-6-thread-1] OfferAccepter:logOperations(63): type: LAUNCH_GROUP launch_group <...>

The actual rlimit_info Protobuf printed in the logs, manually "pretty-printed" for clarity:

rlimit_info {
  rlimits {
    type: RLMT_MEMLOCK
    hard: 18446744073709551615
    soft: 18446744073709551615
  }
  rlimits {
    type: RLMT_NOFILE
    hard: 100000
    soft: 100000
  }
  rlimits {
    type: RLMT_NPROC
    hard: 32768
    soft: 32768
  }
}

Looking at the rlimit_info Protobuf definition we see that they're defined as uint64 (unsigned 64-bit integers).

mesos.proto#L3301-L3302

optional uint64 hard = 2;
optional uint64 soft = 3;

Not having proper unsigned integer primitives is a known JVM fact, which is probably the reason why Long s were used there in the first place. Long s are also 64 bits, but they are signed. It looks like when bytes in the ContainerInfo Protobuf object representation are encoded to be sent to Mesos, the "conversion" of Long value bytes to uint64 ends up being invalid, since Long s are signed.

This leads us to the question: what should the SDK configuration interface for POSIX resource limits look like so that it supports setting rlimits to unlimited while having the SDK library abiding the Protobuf data types?

Proposition: maintain JSON configuration for rlimits as integers, with -1 meaning unlimited

Since there's no "unsigned integer" type in JSON Schema, we can set a minimum of -1 for integer values that ultimately represent Protobuf uint64 s, and then add validation in the Java code so that -1 s are interpreted as unlimited and not encoded as-is.

Concretely this would look like:

config.json

"rlimit_memlock": {
  "type": "object",
  "properties": {
    "soft" : {
      "type": "integer",
      "default": 4096,
      "minimum": -1
    },
    "hard" : {
      "type": "integer",
      "default": 4096,
      "minimum": -1
    }
  }
}

svc.yml

RLIMIT_MEMLOCK:
  soft: {{RLIMIT_MEMLOCK_SOFT}}
  hard: {{RLIMIT_MEMLOCK_HARD}}

And then in the part of the SDK library where the object representation of the YAML is "encoded" into a Protobuf message we would have to handle the -1L special case and construct an rlimitinfo Protobuf in a way that its values represent unlimited. That code currently looks like this:

PodInfoBuilder.java#L666-L683

private static Protos.RLimitInfo getRLimitInfo(Collection<RLimitSpec> rlimits) {
    Protos.RLimitInfo.Builder rLimitInfoBuilder = Protos.RLimitInfo.newBuilder();

    for (RLimitSpec rLimit : rlimits) {
        Optional<Long> soft = rLimit.getSoft();
        Optional<Long> hard = rLimit.getHard();
        Protos.RLimitInfo.RLimit.Builder rLimitsBuilder = Protos.RLimitInfo.RLimit.newBuilder()
            .setType(rLimit.getEnum());

        // RLimit itself validates that both or neither of these are present.
        if (soft.isPresent() && hard.isPresent()) {
            rLimitsBuilder.setSoft(soft.get()).setHard(hard.get());
        }
        rLimitInfoBuilder.addRlimits(rLimitsBuilder);
    }

    return rLimitInfoBuilder.build();
}

Also, we would ideally add validation to prevent rlimit values that are < -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment