Here is some documentation for the OpenAI API compatible endpoints:
Generates text completions for the provided prompt.
Parameters:
-
prompt
(required): The prompt to generate completions for, as a string or list of strings. -
model
: Unused parameter. To change the model, use the/v1/internal/model/load
endpoint. -
stream
: Iftrue
, will stream back partial responses as text is generated. -
max_tokens
: The maximum number of tokens to generate. -
temperature
: Sampling temperature, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. -
top_p
: An alternative to sampling with temperature, called nucleus sampling. -
echo
: Iftrue
, the prompt will be included in the completion. -
stop
: Up to 4 sequences where generation will stop if any are matched.
See GenerationOptions
in typing.py
for other generation parameters.
Returns:
-
id
: ID of the completion. -
choices
: List containing the generated completions. -
usage
: Number of prompt tokens, completion tokens, and total tokens used.
Generates chat message completions based on a provided chat history.
Parameters:
-
messages
(required): Chat history as a list of messages withrole
(user, assistant) andcontent
. -
model
: Unused parameter. To change the model, use the/v1/internal/model/load
endpoint. -
stream
: Iftrue
, will stream back partial responses as text is generated. -
mode
:instruct
,chat
, orchat-instruct
. Controls whether assistant is in character. -
instruction_template
: Name of instruction template file to use. -
character
: Name of character file to use for assistant.
See ChatCompletionRequest
in typing.py
for other parameters.
Returns:
Same as /v1/completions
.
Lists the currently available models.
Gets information about the specified model.
Gets usage statistics for billing purposes.
Parameters:
-
start_date
: Start date for usage stats, in YYYY-MM-DD format. -
end_date
: End date for usage stats, in YYYY-MM-DD format.
Returns:
total_usage
: Total token usage during the specified period.
Transcribes an audio file using Whisper.
Parameters:
-
file
(required): The audio file to transcribe. -
language
: Language spoken in the audio. -
model
: Whisper model to use,tiny
orbase
.
Returns:
text
: Transcription text.
Generates images using Stable Diffusion.
Parameters:
-
prompt
(required): The text prompt to generate images for. -
size
: Size of images to generate, like512x512
. -
n
: Number of images to generate.
Returns:
data
: List of generated images.
Gets sentence embeddings for the provided input text.
Parameters:
-
input
(required): Input text to get embeddings for, as string or list of strings. -
encoding_format
:float
orbase64
.
Returns:
-
object
:list
-
data
: List of embeddings, one for each input.
Checks input text for harmful content.
Parameters:
input
(required): Input text to moderate.
Returns:
results
: List of moderation results, one for each input text.
Encodes text into tokens.
Parameters:
text
(required): Text to encode.
Decodes tokens into text.
Parameters:
tokens
(required): Tokens to decode.
Gets the number of tokens for text.
Parameters:
text
(required): Text to get token count for.
Gets information about the currently loaded model.
Loads a new model. Can be used to switch model on the fly.
Parameters:
-
model_name
(required): Name of model to load. -
args
: Dict of args to pass to model loading. -
settings
: Dict of settings to update after loading.
Stops any ongoing text generation.
So in summary, the main endpoints are /v1/completions, /v1/chat/completions, and /v1/models for generating text, chat, and listing models. The /v1/internal endpoints can be used to switch models, encode/decode, and more.
Tried again, this time feeding it the entire repo, and it came up with these:
Here are the API endpoints I identified in the code along with their associated parameters and python file/function references:
POST /v1/completions
Parameters:
prompt (str|List[str]) - Prompt text for text completion. See
completions_common()
inextensions/openai/completions.py
.stream (bool) - Whether to stream back partial responses. See
completions_common()
inextensions/openai/completions.py
.POST /v1/chat/completions
Parameters:
messages (List[dict]) - Chat history. See
chat_completions_common()
inextensions/openai/completions.py
.stream (bool) - Whether to stream back partial responses. See
chat_completions_common()
inextensions/openai/completions.py
.POST /v1/images/generations
Parameters:
generations()
inextensions/openai/images.py
.POST /v1/embeddings
Parameters:
embeddings()
inextensions/openai/embeddings.py
.POST /v1/moderations
Parameters:
moderations()
inextensions/openai/moderations.py
.GET /v1/models
Lists models. See
list_models()
inextensions/openai/models.py
.GET /v1/models/{model}
Gets model info. See
model_info_dict()
inextensions/openai/models.py
.POST /v1/internal/model/load
Loads a model. See
_load_model()
inextensions/openai/models.py
.POST /v1/internal/encode
Encodes text. See
token_encode()
inextensions/openai/tokens.py
.POST /v1/internal/decode
Decodes tokens. See
token_decode()
inextensions/openai/tokens.py
.POST /v1/internal/token-count
Gets token count for text. See
token_count()
inextensions/openai/tokens.py
.GET /v1/internal/model/info
Gets current model info. See
get_current_model_info()
inextensions/openai/models.py
.Let me know if you need any clarification or have additional questions!