Skip to content

Instantly share code, notes, and snippets.

Prompt:

I am trying to understand the lifecycle of the cuda graph memory pool, e.g., as interacted with in _cuda_releasePool. In particular, the implementation of _use_cuda_memory_pool_manager seems to suggest that the pool is released at the end of the context manager. But in the call sites of this function, it doesn't seem that the pool should be released (especially when we are allocating inputs for the cuda graph region). What is going on?

Response:

You're right to notice this apparent discrepancy. Let's break down what's happening with the CUDA graph memory pool lifecycle:

  1. The _use_cuda_memory_pool_manager context manager is indeed calling _cuda_releasePool at the end of its context. However, this doesn't necessarily mean the pool is being destroyed or fully released.
  2. The _cuda_releasePool function is part of the CUDA caching allocator's API, and its behavior is more nuanced than just destroying the pool:
    • It decrements the reference count for the pool.
  • If the reference c

Prompt:

This is binary data corresponding to a serialized Thrift struct: Copy 18 03 66 6f 6f 16 c8 01 16 b0 df fd eb 0c 16 02 00 The code that generated it is Copy CppExampleLogEntry( userid=100, ip="foo", weight=1, time=CUR_TIME ) Explain, byte-by-byte, the meaning of each byte in the binary data.

Response:

I'll explain the binary data byte-by-byte based on the Thrift Compact protocol encoding described in the document:

  1. 18: This is a field header byte.
    • 0001 1000 in binary
    • dddd (field id delta) = 1 (first field)
  • tttt (field type) = 8 (BINARY, used for strings)
import json
import sys
import re
pattern = re.compile(r'^(\s*)if TEMPLATE[^:]*:\s*\n((?:\1\s{4}.*\n|\n)*?)\1else:\s*\n(?:\1\s{4}.*\n|\n)*?(\1(?=\S|$))', re.MULTILINE)
def replace(match):
indent = match.group(1)
true_branch = match.group(2)
true_branch = re.sub(r'^\s{' + str(len(indent) + 4) + '}', indent, true_branch, flags=re.MULTILINE)
# AOT ID: ['8_inference']
from ctypes import c_void_p, c_long
import torch
import math
import random
import os
import tempfile
from math import inf, nan
from torch._inductor.hooks import run_intermediate_hooks
diff --git a/fbcode/caffe2/torch/_dynamo/convert_frame.py b/fbcode/caffe2/torch/_dynamo/convert_frame.py
--- a/fbcode/caffe2/torch/_dynamo/convert_frame.py
+++ b/fbcode/caffe2/torch/_dynamo/convert_frame.py
@@ -135,6 +135,8 @@
initial_global_state: Optional[GlobalStateGuard] = None
+DEAD = False
+

system: You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so. Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your us

**system**:
You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so. Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your us
import torch
import operator
import itertools
import sys
from typing import Tuple
from torch.fx.experimental.proxy_tensor import make_fx
from torch.fx.experimental.symbolic_shapes import ShapeEnv
from torch._refs import _maybe_broadcast
from torch._prims_common import is_same_shape, make_contiguous_strides_for
@ezyang
ezyang / report_compile_source_on_error.py
Created December 18, 2022 02:31
report_compile_source_on_error
import traceback
import sys
from types import TracebackType
import tempfile
import contextlib
import inspect
# This file contains utilities for ensuring dynamically compile()'d
# code fragments display their line numbers in backtraces.
#
import torch
from torch.utils._python_dispatch import TorchDispatchMode
from torch.utils._pytree import tree_map
import itertools
# cribbed from https://github.com/albanD/subclass_zoo/blob/main/logging_mode.py
class Lit:
def __init__(self, s):
self.s = s