Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save crazyboycjr/8f6dd053b5c2d2fbff2806fca72d2aa2 to your computer and use it in GitHub Desktop.
Save crazyboycjr/8f6dd053b5c2d2fbff2806fca72d2aa2 to your computer and use it in GitHub Desktop.
Alpa runlog, invalid memory access
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % python3 tests/test_install.py
.compile_pipeshard_executable::trace: 0.97 s
compile_pipeshard_executable::jaxpr operations: 0.00 s
compile_pipeshard_executable::stage construction: 0.00 s
compile_pipeshard_executable::apply grad: 0.00 s
compile_pipeshard_executable::shard stages: 1.69 s
compile_pipeshard_executable::launch meshes: 0.72 s
compile_pipeshard_executable::driver executable: 29.27 s
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 5, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 6, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 7, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 8, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 1, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 3, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 2, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 4, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 17, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 19, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 18, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 20, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 14, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 16, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 13, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 15, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 5, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 4, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 2, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 1, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 6, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 8, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 3, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 7, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 16, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 13, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 14, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 15, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 20, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 18, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 17, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 19, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4
.
----------------------------------------------------------------------
Ran 2 tests in 79.713s
OK
python3 tests/test_install.py 29.20s user 31.10s system 71% cpu 1:24.01 total
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % ls
GENVER LICENSE README.md VERSION alpa alpa.egg-info benchmark build build_jaxlib compute-cost-2022-06-22-23-22-30.npy docker docs examples format.sh playground setup.py tests third_party
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] %
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % vim alpa/global_env.py
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % ls
GENVER LICENSE README.md VERSION alpa alpa.egg-info benchmark build build_jaxlib compute-cost-2022-06-22-23-22-30.npy docker docs examples format.sh playground setup.py tests third_party
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] %
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % python3 tests/minimal_reproduce.py
2022-06-24 00:32:44.112 INFO worker - init: Connecting to existing Ray cluster at address: 172.31.33.99:6379
2022-06-24 00:32:44.864 INFO xla_bridge - backends: Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
2022-06-24 00:32:49.011 INFO xla_bridge - backends: Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
/home/cjr/miniconda3/envs/alpa-torch/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:1866: UserWarning: Explicitly requested dtype <class 'numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
lax_internal._check_user_dtype_supported(dtype, "asarray")
/home/cjr/miniconda3/envs/alpa-torch/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:1866: UserWarning: Explicitly requested dtype <class 'numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
lax_internal._check_user_dtype_supported(dtype, "asarray")
compile_pipeshard_executable::trace: 4.47 s
compile_pipeshard_executable::jaxpr operations: 0.18 s
compile_pipeshard_executable::stage construction: 0.00 s
compile_pipeshard_executable::apply grad: 0.04 s
compile_pipeshard_executable::shard stages: 2.49 s
compile_pipeshard_executable::launch meshes: 0.73 s
compile_pipeshard_executable::driver executable: 332.60 s
compile_pipeshard_executable::jaxpr operations: 0.04 s
compile_pipeshard_executable::stage construction: 0.00 s
compile_pipeshard_executable::apply grad: 0.00 s
compile_pipeshard_executable::shard stages: 12.07 s
compile_pipeshard_executable::launch meshes: 0.00 s
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 812, Info: stage 7
(MeshHostWorker pid=16300) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 810, Info: stage 5
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 811, Info: stage 6
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 0.000 GB max_memory_allocated: 1.040 GB next instruction: Opcode: RUN, Task uuid: 808, Info: stage 3
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 809, Info: stage 4
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 0.000 GB max_memory_allocated: 1.821 GB next instruction: Opcode: RUN, Task uuid: 805, Info: stage 0
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 0.000 GB max_memory_allocated: 0.458 GB next instruction: Opcode: RUN, Task uuid: 806, Info: stage 1
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 0.000 GB max_memory_allocated: 0.631 GB next instruction: Opcode: RUN, Task uuid: 807, Info: stage 2
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.053 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 31, Info: allocate zero for recv
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.106 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 43, Info: allocate zero for recv
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RECV, Task uuid: 359, Info:
(MeshHostWorker pid=16300) memory_allocated: 0.053 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 30, Info: allocate zero for recv
(MeshHostWorker pid=16300) memory_allocated: 0.106 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 39, Info: allocate zero for recv
(MeshHostWorker pid=16300) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RECV, Task uuid: 353, Info:
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.843 GB max_memory_allocated: 0.843 GB next instruction: Opcode: RUN, Task uuid: 32, Info: allocate zero for recv
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.904 GB max_memory_allocated: 0.904 GB next instruction: Opcode: RUN, Task uuid: 48, Info: allocate zero for recv
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.916 GB max_memory_allocated: 0.916 GB next instruction: Opcode: RECV, Task uuid: 365, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.188 GB max_memory_allocated: 3.188 GB next instruction: Opcode: RUN, Task uuid: 29, Info: allocate zero for recv
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.251 GB max_memory_allocated: 3.251 GB next instruction: Opcode: RUN, Task uuid: 36, Info: allocate zero for recv
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 335, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 3.907 GB max_memory_allocated: 3.907 GB next instruction: Opcode: RUN, Task uuid: 28, Info: allocate zero for recv
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 3.908 GB max_memory_allocated: 3.908 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.104 GB max_memory_allocated: 4.104 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: SEND, Task uuid: 334, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.250 GB max_memory_allocated: 4.250 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.202 GB max_memory_allocated: 4.250 GB next instruction: Opcode: SEND, Task uuid: 334, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.201 GB max_memory_allocated: 4.250 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.201 GB max_memory_allocated: 4.250 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 113, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.397 GB max_memory_allocated: 4.397 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: SEND, Task uuid: 334, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.543 GB max_memory_allocated: 4.543 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.495 GB max_memory_allocated: 4.543 GB next instruction: Opcode: SEND, Task uuid: 334, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.494 GB max_memory_allocated: 4.543 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.494 GB max_memory_allocated: 4.543 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.690 GB max_memory_allocated: 4.690 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.641 GB max_memory_allocated: 4.690 GB next instruction: Opcode: RUN, Task uuid: 57, Info: allocate zero for recv
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.913 GB max_memory_allocated: 4.913 GB next instruction: Opcode: RECV, Task uuid: 347, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.356 GB max_memory_allocated: 3.356 GB next instruction: Opcode: RUN, Task uuid: 25, Info: allocate zero for recv
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.453 GB max_memory_allocated: 3.453 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.128 GB max_memory_allocated: 3.128 GB next instruction: Opcode: RUN, Task uuid: 27, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.128 GB max_memory_allocated: 3.128 GB next instruction: Opcode: RUN, Task uuid: 33, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.130 GB max_memory_allocated: 3.130 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.587 GB max_memory_allocated: 3.587 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.538 GB max_memory_allocated: 3.587 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.671 GB max_memory_allocated: 3.671 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.622 GB max_memory_allocated: 3.671 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.755 GB max_memory_allocated: 3.755 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.706 GB max_memory_allocated: 3.755 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.840 GB max_memory_allocated: 3.840 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: SEND, Task uuid: 112, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 221, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.924 GB max_memory_allocated: 3.924 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.875 GB max_memory_allocated: 3.924 GB next instruction: Opcode: SEND, Task uuid: 112, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.873 GB max_memory_allocated: 3.924 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.873 GB max_memory_allocated: 3.924 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.006 GB max_memory_allocated: 4.006 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.957 GB max_memory_allocated: 4.006 GB next instruction: Opcode: SEND, Task uuid: 112, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.955 GB max_memory_allocated: 4.006 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.955 GB max_memory_allocated: 4.006 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.088 GB max_memory_allocated: 4.088 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.039 GB max_memory_allocated: 4.088 GB next instruction: Opcode: SEND, Task uuid: 112, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.037 GB max_memory_allocated: 4.088 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.037 GB max_memory_allocated: 4.088 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.170 GB max_memory_allocated: 4.170 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.121 GB max_memory_allocated: 4.170 GB next instruction: Opcode: SEND, Task uuid: 112, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.119 GB max_memory_allocated: 4.170 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.119 GB max_memory_allocated: 4.170 GB next instruction: Opcode: SEND, Task uuid: 1, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.001 GB max_memory_allocated: 25.001 GB next instruction: Opcode: RUN, Task uuid: 26, Info: allocate zero for recv
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.003 GB max_memory_allocated: 25.003 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.418 GB max_memory_allocated: 25.418 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.130 GB max_memory_allocated: 3.130 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.443 GB max_memory_allocated: 25.443 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.052 GB max_memory_allocated: 25.443 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.049 GB max_memory_allocated: 25.443 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.049 GB max_memory_allocated: 25.443 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.465 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.074 GB max_memory_allocated: 25.465 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.072 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.072 GB max_memory_allocated: 25.465 GB next instruction: Opcode: SEND, Task uuid: 220, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.069 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.069 GB max_memory_allocated: 25.465 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.484 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.094 GB max_memory_allocated: 25.484 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.091 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.091 GB max_memory_allocated: 25.484 GB next instruction: Opcode: SEND, Task uuid: 220, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RECV, Task uuid: 296, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.089 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.089 GB max_memory_allocated: 25.484 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.504 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.114 GB max_memory_allocated: 25.504 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.111 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.111 GB max_memory_allocated: 25.504 GB next instruction: Opcode: SEND, Task uuid: 220, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.109 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.109 GB max_memory_allocated: 25.504 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.524 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.133 GB max_memory_allocated: 25.524 GB next instruction: Opcode: SEND, Task uuid: 115, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.295 GB max_memory_allocated: 3.295 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.244 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RUN, Task uuid: 34, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.246 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.246 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.411 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RUN, Task uuid: 35, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.131 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.131 GB max_memory_allocated: 25.524 GB next instruction: Opcode: SEND, Task uuid: 220, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: SEND, Task uuid: 295, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.128 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.128 GB max_memory_allocated: 25.524 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.544 GB max_memory_allocated: 25.544 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.153 GB max_memory_allocated: 25.544 GB next instruction: Opcode: SEND, Task uuid: 220, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.151 GB max_memory_allocated: 25.544 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.151 GB max_memory_allocated: 25.544 GB next instruction: Opcode: SEND, Task uuid: 118, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: SEND, Task uuid: 298, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RECV, Task uuid: 299, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 4
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.525 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RUN, Task uuid: 37, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.476 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.476 GB max_memory_allocated: 3.525 GB next instruction: Opcode: SEND, Task uuid: 295, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: SEND, Task uuid: 298, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.471 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.471 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.636 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RUN, Task uuid: 40, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.587 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.587 GB max_memory_allocated: 3.636 GB next instruction: Opcode: SEND, Task uuid: 295, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: SEND, Task uuid: 298, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.582 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.582 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.747 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RUN, Task uuid: 44, Info: allocate zero for recv
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.698 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RECV, Task uuid: 116, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.698 GB max_memory_allocated: 3.747 GB next instruction: Opcode: SEND, Task uuid: 295, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: SEND, Task uuid: 298, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.693 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.693 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.858 GB max_memory_allocated: 3.858 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.807 GB max_memory_allocated: 3.858 GB next instruction: Opcode: SEND, Task uuid: 238, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.595 GB max_memory_allocated: 3.646 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.586 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RUN, Task uuid: 38, Info: allocate zero for recv
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 335, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 113, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 221, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 296, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 299, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: SEND, Task uuid: 352, Info:
(MeshHostWorker pid=16300) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 5
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 4
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.376395: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.381601: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.930 GB max_memory_allocated: 3.981 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.386703: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.391778: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.396789: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.401734: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.406692: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.411683: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.416699: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.421691: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.426743: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.432283: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.437392: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.442363: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.447344: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.452296: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.457279: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.462275: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.467347: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.472334: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.477384: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.482338: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.487382: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
2022-06-24 00:58:15,521 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MeshHostWorker.run_executable() (pid=6857, ip=172.31.43.221, repr=<alpa.device_mesh.MeshHostWorker object at 0x7fb040f65eb0>)
File "/nfs/cjr/alpa-torch-software/alpa/alpa/device_mesh.py", line 267, in run_executable
self.executables[uuid].execute_on_worker(*args, **kwargs)
File "/nfs/cjr/alpa-torch-software/alpa/alpa/pipeline_parallel/pipeshard_executable.py", line 465, in execute_on_worker
self.worker.run_executable(instruction.task_uuid,
File "/nfs/cjr/alpa-torch-software/alpa/alpa/device_mesh.py", line 267, in run_executable
self.executables[uuid].execute_on_worker(*args, **kwargs)
File "/nfs/cjr/alpa-torch-software/alpa/alpa/mesh_executable.py", line 1178, in execute_on_worker
self.allocate_zero_buffers.execute_sharded_on_local_devices([]))
RuntimeError: INVALID_ARGUMENT: stream is uninitialized or in an error state: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.492387: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.497348: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.920 GB max_memory_allocated: 3.981 GB next instruction: Opcode: RUN, Task uuid: 41, Info: allocate zero for recv
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.502319: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507374: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace ***
(MeshHostWorker pid=6857, ip=172.31.43.221)
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507811: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507879: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507951: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508028: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508112: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508187: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508266: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508336: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state
(MeshHostWorker pid=16300) memory_allocated: 0.443 GB max_memory_allocated: 0.481 GB next instruction: Opcode: FREE, Task uuid: None, Info:
(MeshHostWorker pid=16300) memory_allocated: 0.430 GB max_memory_allocated: 0.481 GB next instruction: Opcode: RUN, Task uuid: 42, Info: allocate zero for recv
(MeshHostWorker pid=16300) memory_allocated: 0.443 GB max_memory_allocated: 0.481 GB next instruction: Opcode: RECV, Task uuid: 353, Info:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment