Skip to content

Instantly share code, notes, and snippets.

@msyyc
Last active December 13, 2023 01:58
Show Gist options
  • Save msyyc/5d2e9359330337b682ae1e921921c587 to your computer and use it in GitHub Desktop.
Save msyyc/5d2e9359330337b682ae1e921921c587 to your computer and use it in GitHub Desktop.
"multipart/form-data" design for Python SDK (Only for DPG model)

Typespec

For the following Typespec definition about "multipart/form-data":

...
model MultiPartRequest {
  id: string;
  profileImage: bytes;
}

op basic(@header contentType: "multipart/form-data", @body body: MultiPartRequest): NoContentResponse;

Expected Python SDK API

Compared with content-type, the only difference is that we don't generate overload for IO since core can't handle IO which may contain IO, too.

    @overload
    def basic(self, body: JSON, **kwargs: Any) -> None:
      ...

    @overload
    def basic(self, body: _models.MultiPartRequest, **kwargs: Any) -> None:
      ...
  
    @distributed_trace
    def basic(self, body: Union[_models.MultiPartRequest, JSON], **kwargs: Any) -> None:
      ...

Users could use SDK with the following ways:

...
with open("image.png", "rb") as file:
    clent.basic({"id": "123", "profileImage": file})
    # or
    # clent.basic({"id": "123", "profileImage": file.read()})

Implementation

  
    def multipart_form_data_file(file: Union[IOBase, bytes]) -> Union[IOBase, Tuple[str, bytes, str]]:
        if isinstance(file, IOBase):
            return file
        return (str(time.time()), file, "application/octet-stream")
    
    def basic(self, body: Union[_models.MultiPartRequest, JSON], **kwargs: Any) -> None:
        ...
        _files = {k: multipart_form_data_file(v) for k, v in body.items() if isinstance(v, (IOBase, bytes))}
        _data = {k: v for k, v in body.items() if not isinstance(v, (IOBase, bytes))}
        _request = build_multi_part_basic_request(
            data=_data,
            files=_files,
            headers=_headers,
            params=_params,
        )
        ...

nit: I make a test locally and it passes with up logic and we don't need change for current azure-core version.

@iscai-msft
Copy link

iscai-msft commented Nov 28, 2023

My thinking is with multipart, we should have the option to pass them as splatted keyword only args, i.e.

def basic(self, *, id: string, profile_image: bytes):
  ...

The reason for this is a lot of the time with multipart form data, we're sending whole files etc. and that to me means we should elevate these properties onto the command line.

I'm not sure if this will be too much with also having a JSON and a models approach though, but I still do think it seems more pythonic to have them as keyword args

cc @johanste @lmazuel @tadelesh

Also since these requests have more of a possibility of being large, we should still have the option for people to stream in the formdata

@lmazuel
Copy link

lmazuel commented Nov 28, 2023

+1 with Izzy, multipart should be flattened on the operation. The general case is that services with multipart form data have very few fields, so flattening is a great option.

@johanste
Copy link

Nit: we should avoid assuming unique names for parts.

All, is there a reason why we would not use the same logic as we do for other operations when determining if we want to spread or not? E.g. named model, use the named model, anonymous model, spread? And if we do have the client side-car overrides that "treat named model as unnamed" and "name this unnamed model", we can override the default behavior....

@msyyc
Copy link
Author

msyyc commented Nov 30, 2023

As long as the body model is named, Python SDK will not spread; if body doesn't have named model, Python SDK will spread it now. So if we want to always spread the body for multipart/form-data when body has named model as up scenario, special logic is needed.

@tadelesh
Copy link

I prefer to have consolidated logic for spread: named model not spread, anonymous model spread.

@msyyc
Copy link
Author

msyyc commented Dec 5, 2023

After discussion in meeting on 12/05, we agree to keep consistent logic: only spread anonymous model.

@johanste
Copy link

Why use time.time() as the (file) name? There is a very high probability of name conflicts.

@msyyc
Copy link
Author

msyyc commented Dec 13, 2023

@johanste The doc is mainly to show API and rough structure of implement, so I don't make too much thought on details (e.g time.time() as file name). If you think the API and implement flow are expected, I will begin the work in code generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment