Here is some documentation for the OpenAI API compatible endpoints:

Completions

/v1/completions

Generates text completions for the provided prompt.

Parameters:

prompt (required): The prompt to generate completions for, as a string or list of strings.
model: Unused parameter. To change the model, use the /v1/internal/model/load endpoint.
stream: If true, will stream back partial responses as text is generated.
max_tokens: The maximum number of tokens to generate.
temperature: Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p: An alternative to sampling with temperature, called nucleus sampling.
echo: If true, the prompt will be included in the completion.
stop: Up to 4 sequences where generation will stop if any are matched.

See GenerationOptions in typing.py for other generation parameters.

Returns:

id: ID of the completion.
choices: List containing the generated completions.
usage: Number of prompt tokens, completion tokens, and total tokens used.

/v1/chat/completions

Generates chat message completions based on a provided chat history.

Parameters:

messages (required): Chat history as a list of messages with role (user, assistant) and content.
model: Unused parameter. To change the model, use the /v1/internal/model/load endpoint.
stream: If true, will stream back partial responses as text is generated.
mode: instruct, chat, or chat-instruct. Controls whether assistant is in character.
instruction_template: Name of instruction template file to use.
character: Name of character file to use for assistant.

See ChatCompletionRequest in typing.py for other parameters.

Returns:

Same as /v1/completions.

Models

/v1/models

Lists the currently available models.

/v1/models/{model_id}

Gets information about the specified model.

Billing

/v1/billing/usage

Gets usage statistics for billing purposes.

Parameters:

start_date: Start date for usage stats, in YYYY-MM-DD format.
end_date: End date for usage stats, in YYYY-MM-DD format.

Returns:

total_usage: Total token usage during the specified period.

Audio

/v1/audio/transcriptions

Transcribes an audio file using Whisper.

Parameters:

file (required): The audio file to transcribe.
language: Language spoken in the audio.
model: Whisper model to use, tiny or base.

Returns:

text: Transcription text.

Images

/v1/images/generations

Generates images using Stable Diffusion.

Parameters:

prompt (required): The text prompt to generate images for.
size: Size of images to generate, like 512x512.
n: Number of images to generate.

Returns:

data: List of generated images.

Embeddings

/v1/embeddings

Gets sentence embeddings for the provided input text.

Parameters:

input (required): Input text to get embeddings for, as string or list of strings.
encoding_format: float or base64.

Returns:

object: list
data: List of embeddings, one for each input.

Moderations

/v1/moderations

Checks input text for harmful content.

Parameters:

input (required): Input text to moderate.

Returns:

results: List of moderation results, one for each input text.

Encoding/Decoding

/v1/internal/encode

Encodes text into tokens.

Parameters:

text (required): Text to encode.

/v1/internal/decode

Decodes tokens into text.

Parameters:

tokens (required): Tokens to decode.

/v1/internal/token-count

Gets the number of tokens for text.

Parameters:

text (required): Text to get token count for.

Internal

/v1/internal/model/info

Gets information about the currently loaded model.

/v1/internal/model/load

Loads a new model. Can be used to switch model on the fly.

Parameters:

model_name (required): Name of model to load.
args: Dict of args to pass to model loading.
settings: Dict of settings to update after loading.

/v1/internal/stop-generation

Stops any ongoing text generation.

So in summary, the main endpoints are /v1/completions, /v1/chat/completions, and /v1/models for generating text, chat, and listing models. The /v1/internal endpoints can be used to switch models, encode/decode, and more.

Tried again, this time feeding it the entire repo, and it came up with these:

Here are the API endpoints I identified in the code along with their associated parameters and python file/function references:

POST /v1/completions

Parameters:

prompt (str|List[str]) - Prompt text for text completion. See completions_common() in extensions/openai/completions.py.
stream (bool) - Whether to stream back partial responses. See completions_common() in extensions/openai/completions.py.

POST /v1/chat/completions

Parameters:

messages (List[dict]) - Chat history. See chat_completions_common() in extensions/openai/completions.py.
stream (bool) - Whether to stream back partial responses. See chat_completions_common() in extensions/openai/completions.py.

POST /v1/images/generations

Parameters:

prompt (str) - Prompt text for image generation. See generations() in extensions/openai/images.py.

POST /v1/embeddings

Parameters:

input (List[str]) - Text to encode. See embeddings() in extensions/openai/embeddings.py.

POST /v1/moderations

Parameters:

input (str|List[str]) - Text to moderate. See moderations() in extensions/openai/moderations.py.

GET /v1/models

Lists models. See list_models() in extensions/openai/models.py.

GET /v1/models/{model}

Gets model info. See model_info_dict() in extensions/openai/models.py.

POST /v1/internal/model/load

Loads a model. See _load_model() in extensions/openai/models.py.

POST /v1/internal/encode

Encodes text. See token_encode() in extensions/openai/tokens.py.

POST /v1/internal/decode

Decodes tokens. See token_decode() in extensions/openai/tokens.py.

POST /v1/internal/token-count

Gets token count for text. See token_count() in extensions/openai/tokens.py.

GET /v1/internal/model/info

Gets current model info. See get_current_model_info() in extensions/openai/models.py.

Let me know if you need any clarification or have additional questions!

Straafe/Claude2OobaOpenAIAPIDoc.md

Completions

/v1/completions

/v1/chat/completions

Models

/v1/models

/v1/models/{model_id}

Billing

/v1/billing/usage

Audio

/v1/audio/transcriptions

Images

/v1/images/generations

Embeddings

/v1/embeddings

Moderations

/v1/moderations

Encoding/Decoding

/v1/internal/encode

/v1/internal/decode

/v1/internal/token-count

Internal

/v1/internal/model/info

/v1/internal/model/load

/v1/internal/stop-generation

Straafe commented Nov 14, 2023