Skip to content

Instantly share code, notes, and snippets.

@msanjeevkumar
Created July 11, 2023 18:56
Show Gist options
  • Save msanjeevkumar/f002bf1acb42361e24f1b2c944732f8b to your computer and use it in GitHub Desktop.
Save msanjeevkumar/f002bf1acb42361e24f1b2c944732f8b to your computer and use it in GitHub Desktop.
[{"id": "qdrant", "name": "Qdrant", "description": "Qdrant is a vector similarity search engine designed for storing, searching, and managing points along with their respective payloads. Built with an emphasis on extensive filtering, it is particularly beneficial for neural network matching, semantic-based matching, and faceted search. Qdrant offers various deployment options including local mode, on-premise server deployment, and Qdrant Cloud, each catering to different use-case scenarios. [Learn More](https://qdrant.tech/documentation/)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\n<a href='https://qdrant.tech/' target='_blank'>Qdrant</a> is a vector similarity search engine designed for storing, searching, and managing points along with their respective payloads. Built with an emphasis on extensive filtering, it is particularly beneficial for neural network matching, semantic-based matching, and faceted search. Qdrant offers various deployment options including local mode, on-premise server deployment, and Qdrant Cloud, each catering to different use-case scenarios. <a href='https://qdrant.tech/documentation' target='_blank'>Learn more</a> \ud83d\ude80.\n\n## \ud83d\udc47 Getting Started (Implementation)\n\nThe service can be used with Langchain or the official qdrant python client (https://github.com/qdrant/qdrant). Below you can find an example using the service with Langchain. In the code snippet, we are assuming that you are using all-miniLM-l6-v2 model for embeddings generation and the service is running locally on port 8001.\n\n```python\n\n!pip install qdrant-client\n\nimport os\n\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.docstore.document import Document\nfrom langchain.vectorstores import Qdrant\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\ndoc1 = Document(page_content=\"Prem is an easy to use open source AI platform. With Prem you can quickly build provacy preserving AI applications.\")\ndoc2 = Document(page_content=\"\"\"\nPrem App\n\nAn intuitive desktop application designed to effortlessly deploy and self-host Open-Source AI models without exposing sensitive data to third-party.\n\n\"\"\")\ndoc3 = Document(page_content=\"\"\"\nPrem Benefits\n\nEffortless Integration\nSeamlessly implement machine learning models with the user-friendly interface of OpenAI's API.\n\nReady for the Real World\nBypass the complexities of inference optimizations. Prem's got you covered.\n\nRapid Iterations, Instant Results\nDevelop, test, and deploy your models in just minutes.\n\nPrivacy Above All\nYour keys, your models. We ensure end-to-end encryption.\n\nComprehensive Documentation\nDive into our rich resources and learn how to make the most of Prem.\n\nPreserve Your Anonymity\nMake payments with Bitcoin and Cryptocurrency. It's a permissionless infrastructure, designed for you.\n\"\"\")\n\n# Using sentence transformers all-MiniLM-L6-v2\nembeddings = OpenAIEmbeddings(openai_api_base=\"http://localhost:8444/v1\")\n\n# Using locally running Qdrant\nurl = \"http://localhost:6333\"\n\nvectorstore = Qdrant.from_documents(\n [doc1, doc2, doc3], \n embeddings, \n url=url, \n collection_name=\"prem_collection_test\",\n)\n\nquery = \"What are Prem Benefits?\"\ndocs = vectorstore.similarity_search(query)\nprint(docs[0].page_content)\n```", "interfaces": ["vector-store"], "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/vector-store-qdrant/logo.svg", "modelInfo": {}, "volumePath": "/qdrant/storage", "dockerImages": {"cpu": {"size": 126913893, "image": "qdrant/qdrant:v1.0.3"}}, "defaultPort": 6333, "defaultExternalPort": 6333}, {"id": "pgvector", "name": "PGVector", "description": "It is a combination of PostgreSQL, an open-source relational database, and PGVector, an extension that enables vector-based operations and similarity searches. PGVector allows you to store and query vector embeddings efficiently within the PostgreSQL database.", "documentation": "\n# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\n[PGVector](https://github.com/pgvector/pgvector) is an integration of PostgreSQL with vector-based operations and similarity searches. It allows you to store and query vector embeddings efficiently within the PostgreSQL database. PGVector is a powerful tool for building AI and ML applications that require similarity search capabilities. It supports:\n- exact and approximate nearest neighbor search\n- L2 distance, inner product, and cosine distance\n\nLearn more about <a href='https://github.com/pgvector/pgvector' target='_blank'>PGVector here</a> \ud83d\ude80.\n\n## \ud83d\udc47 Getting Started (Implementation)\n\nThe service can be used with Langchain. You can check the [langchain pgvector documentation](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/pgvector) for detailed usage instructions. In the code snippet below, we assume that you have installed the required dependencies present in *Pre-requisites* step and are using `all-miniLM-l6-v2` model for embeddings generation and the service is running locally on port `8444`.\n\n\n### Pre-requisites\n\n```bash\npip install pgvector tiktoken psycopg2-binary openai langchain\n```\n\n```python\n\nimport os\n\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.docstore.document import Document\nfrom langchain.vectorstores.pgvector import PGVector\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\ndoc1 = Document(page_content=\"Prem is an easy to use open source AI platform. With Prem you can quickly build provacy preserving AI applications.\")\ndoc2 = Document(page_content=\"\"\"\nPrem App\n\nAn intuitive desktop application designed to effortlessly deploy and self-host Open-Source AI models without exposing sensitive data to third-party.\n\n\"\"\")\ndoc3 = Document(page_content=\"\"\"\nPrem Benefits\n\nEffortless Integration\nSeamlessly implement machine learning models with the user-friendly interface of OpenAI's API.\n\nReady for the Real World\nBypass the complexities of inference optimizations. Prem's got you covered.\n\nRapid Iterations, Instant Results\nDevelop, test, and deploy your models in just minutes.\n\nPrivacy Above All\nYour keys, your models. We ensure end-to-end encryption.\n\nComprehensive Documentation\nDive into our rich resources and learn how to make the most of Prem.\n\nPreserve Your Anonymity\nMake payments with Bitcoin and Cryptocurrency. It's a permissionless infrastructure, designed for you.\n\"\"\")\n\n# Using sentence transformers all-MiniLM-L6-v2\nembeddings = OpenAIEmbeddings(openai_api_base=\"http://localhost:8444/v1\")\n\n# Using locally running PostgreSQL connection\nCONNECTION_STRING = PGVector.connection_string_from_db_params(\n driver=\"psycopg2\",\n host=\"localhost\",\n port=\"5432\",\n database=\"postgres\",\n user=\"postgres\",\n password=\"postgres\",\n)\n\n# Create PGVector vectorstore instance\nvectorstore = PGVector.from_documents(\n documents=[doc1, doc2, doc3],\n embedding=embeddings,\n connection_string=CONNECTION_STRING,\n)\n\n# Perform similarity search\nquery = \"What are Prem Benefits?\"\ndocs = vectorstore.similarity_search(query)\nprint(docs[0].page_content)\n```\n", "interfaces": ["vector-store"], "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/vector-store-pgvector/logo.svg", "modelInfo": {}, "envVariables": ["POSTGRES_HOST=localhost", "POSTGRES_PORT=5432", "POSTGRES_DATABASE=postgres", "POSTGRES_USER=postgres", "POSTGRES_PASSWORD=postgres"], "execCommands": ["psql -U postgres -d postgres -c 'CREATE EXTENSION vector;'"], "volumePath": "/var/lib/postgresql/data", "dockerImages": {"cpu": {"size": 385485843, "image": "ankane/pgvector:v0.4.4"}}, "defaultPort": 5432, "defaultExternalPort": 5432}, {"id": "mpt-7b-chat", "name": "MPT 7B Chat", "beta": true, "description": "MPT-7B-Chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the ShareGPT-Vicuna, HC3, Alpaca, HH-RLHF, and Evol-Instruct datasets. Its released under CC-By-NC-SA-4.0 (non-commercial use only) license, but inference can be run on various GPU configurations.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nMPT-7B-Chat is a chatbot-like model for dialogue generation. It was built by finetuning <a href='https://huggingface.co/mosaicml/mpt-7b' target='_blank'>MPT-7b</a> on the <a href='https://huggingface.co/datasets/jeffwan/sharegpt_vicuna' target='_blank'>ShareGPT-Vicuna</a>, <a href='https://huggingface.co/datasets/Hello-SimpleAI/HC3' target='_blank'>HC3</a>, <a href='https://huggingface.co/datasets/tatsu-lab/alpaca' target='_blank'>Alpaca</a>, <a href='https://huggingface.co/datasets/Anthropic/hh-rlhf' target='_blank'>HH-RLHF</a>, <a href='https://huggingface.co/datasets/victor123/evol_instruct_70k' target='_blank'>Evol-Instruct</a> datasets. This model was trained by MosaicML and follows a modified decoder-only transformer architecture.\n\n\n## \ud83d\udcbb Hardware Requirements\n> **Memory requirements**: 13.91 GB (13269 MiB).\n\nTo run the `mpt-7b-chat` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100` or higher\n- GPU: NVIDIA V100 or higher.\n\n### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n\n- V100 GPUs: The model can be run on V100 GPUs.\n\n\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> \\n\\nThere are several reasons why you might want to run machine learning models on-premise. First, it gives you more control over the data and the model. You can make sure that the data is clean and that the model is well-validated before you deploy it into production. Second, it can help you meet compliance requirements. If you\u2019re working with sensitive data, you may need to keep it on-premise in order to meet certain regulatory requirements. Finally, it can help you meet performance requirements. If you\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> \\nHow do I run my models on-premise?\\nI'm interested in running my models on-premise, but I'm not sure how to get started. Can you help?\\nSure, I'd be happy to help! To run your models on-premise, you'll need to set up a server and install a machine learning framework like TensorFlow or PyTorch. You can then use the framework to train and deploy your models.\\nHere are the general steps you'll need to follow:\\n1. Choose a server: You'll need to choose\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> \\nRunning models on-premise can have several limitations, including:\\n1. Security: Running models on-premise can be risky from a security perspective, as the data and models are stored on a local server, which can be vulnerable to cyberattacks.\\n2. Scalability: Running models on-premise can be limited in terms of scalability, as the infrastructure needs to be built and maintained in-house, which can be time-consuming and expensive.\\n3. Data privacy: Running models on-premise can also be a privacy concern, as the data is stored on a local server, which can be accessed by only a limited number of people.\\n4. Maintenance: Running models on-premise can require a lot of maintenance, as the hardware and software need to be updated regularly to ensure optimal performance.\\nOverall, running models on-premise can be a good option for organizations that have a small amount of data and models, but it may not be the best option for larger organizations with complex models and large amounts of data.#!/bin/bash\\n# This script will install the latest version of TensorFlow on a Linux machine\\n# It\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\n \\nWhat are the trade-offs of deploying models on-premise I should be aware of?\\nI'm trying to decide whether to deploy a machine learning model on-premise or in the cloud. I'm aware of the benefits of on-premise deployment, such as greater control over the data and the model, and potentially lower costs if you have your own hardware. However, I'm also aware of the potential trade-offs, such as the need for specialized hardware and the potential for increased security risks.\\nWhat are some other trade-offs I should be aware of?\\nI'm also aware of the potential for increased maintenance and support costs if I deploy on-premise. Are there any other trade-offs I should be aware of?\\nAdditionally, I'm aware of the potential for decreased flexibility if I deploy on-premise. Is this a significant concern for you, or would you rather have the control over the data and model?\\nFinally, I'm aware of the potential for increased costs if I deploy on-premise. Are there any other costs I should be aware of?\\nI'm trying to make an informed decision, so\n\n</blockquote>\n\nIt's visible from the above outputs that model doesn't answer in question answer mode by default and it's really bad at knowing when to stop. We recommend users of `MPT 7B Chat` to develop guardrails and to take appropriate precautions for any production use as it's only behaves like a text generation model by default most of the time.\n\nAn example would be using a Chat Prompt Template as shown below:\n\nPrompt:\n```\nYou are an AI assistant in a conversational setting.\nProvide a concise and accurate conversational answer to anything User asks.\n===================\n\nUser: What are the trade-offs of deploying models on-premise I should be aware of?\nAssistant:\"\"\"\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8450/v1\", max_tokens=128)\nmessages = [HumanMessage(content=\"Why do I need to run machine learning models on-premise?\")]\nprint(chat(messages))\n\n# output:\n# \\n\\nMachine learning models are trained on large datasets, and the quality of the training is dependent on the quality of the data. If the data is stored on-premise, then the training process can take advantage of that. If the data is stored in the cloud, then the training process must make do with whatever data is available in the cloud, which may not be the best quality. Similarly, if the model is deployed on-premise, then it can take advantage of the on-premise data, but if it is deployed in the\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nYou are an AI assistant in a conversational setting.\nProvide a conversational answer to any question an User asks. Be original, concise, accurate and helpful.\n===================\n\nUser: {user_message}\nAssistant:\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Why do I need to run machine learning models on-premise?\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8450/v1\", max_tokens=128)\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n\n# output: > Finished chain.\n# Running machine learning models on-premise can help you ensure data privacy and security, as well as control over the data sources and processing pipeline. It can also provide faster access to data and reduce latency, and enable you to customize your models and infrastructure to meet your specific needs.#\n\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\n\n### \ud83d\udeab Limitations and Biases\nWe have noticed that the model doesn't fully behave like a question answering model by default, it's recommended to use this model with a Chat Prompt Template as shown above.\n\nWe recommend users of MPT-7B-Chat to develop guardrails and to take appropriate precautions while using it.\n\n\n## \ud83d\udcdc License\nMPT-7B-Chat can produce factually incorrect output, and should not be relied on to produce factually accurate information. MPT-7B-Chat was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.\nIt is made available under a license of CC-By-NC-SA-4.0 (for non-commercial use only)", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-mpt-7b-chat/logo.svg", "modelInfo": {"memoryRequirements": 13269, "tokensPerSecond": 40}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 33040731294, "image": "ghcr.io/premai-io/chat-mpt-7b-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8450}, {"id": "mpt-7b", "name": "MPT 7B", "beta": true, "description": "MPT 7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. The model is particularly designed for commercial use (License: Apache-2.0) and its inference can be run on various GPU configurations.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\nMPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by <a href='https://www.mosaicml.com/' target='_blank'>MosaicML</a>.\n\nMPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.\n\n\n\n## \ud83d\udcbb Hardware Requirements\n> **Memory requirements**: 14.1 GB (13455 MiB).\n\nTo run the `mpt-7b` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100` or higher\n- GPU: NVIDIA V100 or higher.\n\n### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n\n- V100 GPUs: The model can be run on V100 GPUs.\n\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> \\nWhen it comes to running machine learning models, you have two options: run them on-premise or run them in the cloud.\\nOn-premise means that you are running the machine learning models on your own servers. This is typically done by an organization that wants to keep their data and models in-house, or because they don\u2019t have the resources to run the models in the cloud.\\nRunning machine learning models on-premise can be a great option if you have the resources to do so. It allows you to have complete control over\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> \\nThe on-premise option is available for those customers who want to run their models on their own hardware. The on-premise option is available for all models except the Financial Services models.\\nHow do I run my models in the cloud?\\nThe cloud option is available for those customers who want to run their models on a secure, hosted environment. The cloud option is available for all models except the Financial Services models.\\nWhat is the difference between the cloud and on-premise options?\\nThe cloud option is a hosted environment where the model is run on a\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> \\nOn-premise software can be expensive and require a lot of maintenance. You also have to pay for the hardware, and if you want to add new features or fix bugs, you have to wait for the vendor to release a new version.\\nWith a cloud-based model management solution, you can access your models from anywhere, anytime, and on any device. You can also update your models in real-time and add new features without waiting for a new release.\\nWith a cloud-based model management solution, you can access your models from anywhere, anytime\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\n \\nWhat are the trade-offs of deploying models on-premise I should be aware of? How can I minimize the risks of deploying models on-premise?\\nThe first thing to consider is the level of control you have over the environment in which the model is deployed. If you are deploying the model on-premise, you have complete control over the environment. You can ensure that the model is deployed in a secure environment and that the environment is properly maintained. You can also ensure that the model is deployed in an environment that meets the performance requirements of the model.\\nThe second thing to consider is the level of control you have over the data that the model is using. If you are deploying the model on-premise, you have complete control over the data that the model is using. You can ensure that the data is accurate and up-to-date. You can also ensure that the data is protected from unauthorized access.\\nThe third thing to consider is the level of control you have over the users of the model. If you are deploying the model on-premise, you have complete control over the users of the model. You can ensure that\n\n</blockquote>\n\nIt's visible from the above outputs that model doesn't answer in question answer mode by default and it's really bad at knowing when to stop. We recommend users of `MPT 7B` to develop guardrails and to take appropriate precautions for any production use as it's only behaves like a text generation model by default most of the time.\n\nAn example would be using a Chat Prompt Template as shown below but it doesn't work well all the time even with it:\n\nPrompt:\n```\nYou are an AI assistant in a conversational setting.\nProvide a concise and accurate conversational answer to anything User asks.\n===================\n\nUser: What are the trade-offs of deploying models on-premise I should be aware of?\nAssistant:\"\"\"\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8451/v1\", max_tokens=128)\nmessages = [HumanMessage(content=\"Why do I need to run machine learning models on-premise?\")]\nprint(chat(messages))\n\n# output:\n# \\n\\nMachine learning models are trained on large datasets, and the quality of the training is dependent on the quality of the data. If the data is stored on-premise, then the training process can take advantage of that. If the data is stored in the cloud, then the training process must make do with whatever data is available in the cloud, which may not be the best quality. Similarly, if the model is deployed on-premise, then it can take advantage of the on-premise data, but if it is deployed in the\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nYou are an AI assistant in a conversational setting.\nProvide a conversational answer to any question an User asks. Be original, concise, accurate and helpful.\n===================\n\nUser: {user_message}\nAssistant:\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Why do I need to run machine learning models on-premise?\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8451/v1\", max_tokens=128)\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n\n# output: > Finished chain.\n# Running machine learning models on-premise can help you ensure data privacy and security, as well as control over the data sources and processing pipeline. It can also provide faster access to data and reduce latency, and enable you to customize your models and infrastructure to meet your specific needs.#\n\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\n\n### \ud83d\udeab Limitations and Biases\nMPT-7B (Base) is not intended for deployment without finetuning. It should not be used for human-facing interactions without further guardrails and user consent.\n\nMPT-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. MPT-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.\n\n\n## \ud83d\udcdc License\nIt is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-mpt-7b/logo.svg", "modelInfo": {"memoryRequirements": 13455, "tokensPerSecond": 40}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 33040731243, "image": "ghcr.io/premai-io/mpt-7b-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8451}, {"id": "xgen-7b-8k-inst", "name": "Salesforce XGen 7B Instruct", "beta": true, "description": "XGen 7B Instruct, developed by Salesforce AI Research, is a 7B parameters supervised finetuned model on public domain instructional data. Released for research purpose, its inference can be run on various GPU configurations.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nXGen-7B Instruct with 8k Context length is a 7B parameters model released by Salesforce, supervised fine tuned on public domain instructional data including databricks-dolly-15k, oasst1, Baize and GPT-related datasets. <a href='https://huggingface.co/Salesforce/xgen-7b-8k-inst' target='_blank'>Learn More</a> and <a href='https://blog.salesforceairesearch.com/xgen/#note-on-potential-risks' target='_blank'>Blog here</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\n> **Memory requirements**: 15.79 GB (15065 MiB).\n\nTo run the `xgen-7b-8k-inst` service, you'll need the following hardware configuration:\n\n#### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100` or higher\n- GPU: NVIDIA V100 or higher.\n\n#### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n\n- V100 GPUs: The model can be run on V100 GPUs.\n\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> \\nThe first reason is that some organizations have compliance and regulatory requirements that mandate the use of on-premise infrastructure. For example, some industries, such as healthcare and finance, have strict regulations that require data to be stored and processed within their own premises.\\n\\nThe second reason is that some organizations want to have full control over their data and infrastructure. By running machine learning models on-premise, organizations can ensure that they have full visibility and control over their data and models, and can prevent data breaches or other security incidents.\\n\\nThe third reason is\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> \\nI am interested in using on-premise solutions for my machine learning models. How do I run my models on-premise?\\nThis article has been viewed 53,872 times.\\nIf you have a big project to start, or simply need to feel more organized, there are several strategies you can use to stay on top of your tasks and achieve a sense of calm and focus. Here are some effective ways to increase your productivity and reduce distractions while working on a computer.\\nUse a headset or other noise-canceling headphones. These will reduce backgrou\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> \\n### Assistant: There are several limitations to running models on-premise, including:\\n\\n1. Limited scalability: Running models on-premise can be limited in terms of scalability. If your model needs to handle a large number of users or data, it may not be able to do so efficiently.\\n2. Limited access to resources: On-premise models may not have access to the same resources as cloud-based models, such as high-performance computing (HPC) resources or advanced data analytics tools.\\n3. Limited flexibility:\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\n \\n### Assistant: When deploying machine learning models on-premise, there are several trade-offs to consider. Some of the key factors to think about include:\\n\\n1. Data security and privacy: When data is stored on-premise, it is physically stored in a location that is under the control of the organization. This can be beneficial in terms of data security and privacy, as the organization has full control over the data and can implement additional security measures to protect it. However, it also means that the organization is responsible for the physical security of the data and any breaches could have serious consequences.\\n2. Scalability: On-premise deployment can be more difficult to scale than cloud-based deployment, as the organization is responsible for purchasing and maintaining the necessary hardware and infrastructure. This can be a significant investment and may limit the organization's ability to quickly scale up or down as needed.\\n3. Maintenance and support: When deploying models on-premise, the organization is responsible for maintaining and supporting the hardware and infrastructure. This can be time-consuming and expensive, as the organization may need to hire additional staff or contractors to manage the infrastructure.\\n\n\n</blockquote>\n\nIt's visible above from the outputs that model sometimes generates gibberish and tends to hallucinate if prompted without any special techniques. We recommend users of XGen models to develop guardrails and to take appropriate precautions for any production use as it's only at auto-regressive text generation by default.\n\nAn example would be using a Chat Prompt Template as shown below to improve performance:\n\nPrompt:\n```\nYou are an AI assistant in a conversational setting.\nProvide a concise and accurate conversational answer to anything User asks.\n===================\n\nUser: What are the trade-offs of deploying models on-premise I should be aware of?\nAssistant:\"\"\"\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8449/v1\", max_tokens=128)\nmessages = [HumanMessage(content=\"What are the trade-offs of deploying models on-premise I should be aware of?\")]\nprint(chat(messages))\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nYou are an AI assistant in a conversational setting.\nProvide a conversational answer to any question an User asks. Be original, concise, accurate and helpful.\n===================\n\nUser: {user_message}\nAssistant:\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Why do I need to run machine learning models on-premise?\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8449/v1\", max_tokens=128)\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\nOn standard NLP benchmarks, XGen achieves comparable or better results when compared with state-of-the-art open-source LLMs (e.g. MPT, Falcon, LLaMA, Redpajama, OpenLLaMA) of similar model size.\nOur targeted evaluation on long sequence modeling benchmarks show benefits of our 8K-seq models over 2K- and 4K-seq models.\n\nXGen-7B achieves equally strong results both in text (e.g., MMLU, QA) and code (HumanEval) tasks.\n\nWe have tested both xgen-7b-8k-inst and xgen-7b-8k-base models on similar set of questions and in results we found xgen-7b-8k-inst model to be more accurate in generating better responses. Results can be found [here](https://github.com/premAI-io/prem-registry/tree/main/chat-xgen-7b-8k-inst/results).\n\n### \ud83d\udeab Limitations and Biases\n\nWe have noticed that the model sometimes generates responses that are totally irrelevant and mostly gibberish like some random code, etc. Using a Chat Prompt Template as shown above can help improve performance.\nWe recommend users of XGen models to develop guardrails and to take appropriate precautions for any production use.\n\nThe creators of XGen 7B have mentioned that despite their effort in addressing the risks of bias, toxicity and hallucinations both in pre-training and fine-tuning stages, like other LLMs, XGen-7b models are not free from such limitations. We hope our open-sourced codebase will help other researchers better understand these challenges and improve on these key limitations for making AI beneficial for everyone.\n\n\n## \ud83d\udcdc License\nIt's only mentioned to use for research purposes. Refer to the repo <a href='https://github.com/salesforce/xgen#models' target='_blank'>here</a>.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-xgen-7b-8k-inst/logo.svg", "modelInfo": {"memoryRequirements": 15065, "tokensPerSecond": 23}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 47388920099, "image": "ghcr.io/premai-io/chat-xgen-7b-8k-inst-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8449}, {"id": "stable-diffusion-2-1", "name": "Stable Diffusion 2.1", "description": "Stable Diffusion v2-1 is an advanced version of the Stable Diffusion v2 model, developed by Robin Rombach and Patrick Esser. This model is designed to generate and modify images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encoder (OpenCLIP-ViT/H). The model was initially fine-tuned from the Stable Diffusion v2 model and then further trained for an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98.", "documentation": "# Documentation\n\n## \ud83d\udccc Description\n\nStable Diffusion v2-1 is an advanced version of the Stable Diffusion v2 model, developed by Robin Rombach and Patrick Esser. This model is designed to generate and modify images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encoder (OpenCLIP-ViT/H). The model was initially fine-tuned from the Stable Diffusion v2 model and then further trained for an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98. <a href='https://stability.ai/blog/stablediffusion2-1-release7-dec-2022' target='_blank'>Learn More</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\nTo run the `stable-diffusion-2-1` service on Prem, you'll need access to a GPU with at least 16GiB of RAM.\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\n\n![Xc3tlK4h](https://github.com/premAI-io/prem-registry/assets/29598954/3310b52f-aaeb-44fc-9bfa-9244ef6c0c6e)\n\n### 2\ufe0f\u20e3 Prompt: Low polygon panda 3d\n![E5MREIGA](https://github.com/premAI-io/prem-registry/assets/29598954/a11d02eb-90cc-4b3c-a7a4-8c8abc988bc6)\n\n### 3\ufe0f\u20e3 Prompt: 3d hiper-realistic rick sanchez and morty\n![zmicPjRq](https://github.com/premAI-io/prem-registry/assets/29598954/8ea64522-b255-452c-a06c-5050dfb65be6)\n\n### 4\ufe0f\u20e3 Prompt: Synthwave brad pitt wearing headphones, animated, trending on artstation, portrait\n\n![pXBajmuD](https://github.com/premAI-io/prem-registry/assets/29598954/4ddb1602-bf99-4921-aeb2-f1149556e476)\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\ude80 Getting Started with OpenAI Python client\n\nThe service exposes the same endpoints as OpenAI DALL-E does. You can directly use the official `openai` python library.\n\n```python\n\n!pip install openai\n!pip install pillow\n\nimport io\nimport base64\nimport openai\n\nfrom PIL import Image\n\nopenai.api_base = \"http://localhost:9111/v1\"\nopenai.api_key = \"random-string\"\n\nresponse = openai.Image.create(\n prompt=\"Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\",\n n=1,\n size=\"512x512\"\n)\n\nimage_string = response[\"data\"][0][\"b64_json\"]\n\nimg = Image.open(io.BytesIO(base64.decodebytes(bytes(image_string, \"utf-8\"))))\nimg.save(\"iron_man.jpeg\")\n\n```\n\n## \ud83d\udcdc License\n\nThe model is under CreativeML Open RAIL++-M License.\n", "beta": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/diffuser-stable-diffusion-2-1/logo.svg", "modelInfo": {"memoryRequirements": 14705, "secondsPerImage": 8}, "interfaces": ["diffuser"], "dockerImages": {"gpu": {"size": 26641561465, "image": "ghcr.io/premai-io/diffuser-stable-diffusion-2-1-gpu:1.0.2"}}, "defaultPort": 8000, "defaultExternalPort": 9111}, {"id": "gpt4all-lora-q4", "name": "GPT4ALL Lora Q4", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-gpt4all-lora-q4/logo.svg", "description": "GPT4All, developed by Nomic AI, is a chatbot trained over an extensive corpus of assistant interactions. By finetuning LLaMA 7B, GPT4All provides an open-source ecosystem to train and deploy efficient, assistant-style large language models locally on consumer-grade CPUs. This democratized approach to AI aims to bolster open research, reproducibility, and promote developments in AI alignment and interpretability. The current model has been 4-bit quantized using ggml framework.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nGPT4All, developed by Nomic AI, is a chatbot trained over an extensive corpus of assistant interactions. By finetuning LLaMA 7B, GPT4All provides an open-source ecosystem to train and deploy efficient, assistant-style large language models locally on consumer-grade CPUs. This democratized approach to AI aims to bolster open research, reproducibility, and promote developments in AI alignment and interpretability. The current model has been 4-bit quantized using ggml framework. [Learn more](https://github.com/nomic-ai/gpt4all).\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> There are several reasons why you might want to run machine learning models on-premise. One reason is that it can provide greater control over the data and model, which can be important for certain use cases. Another reason is that it can allow for faster execution times, as there may not be a need to wait for cloud infrastructure or resources. Additionally, some organizations may have regulatory requirements that require them to keep sensitive data on-premise rather than in the cloud.\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> You can run your models on-premise by installing the necessary software and hardware requirements. You will need to have a server or a cluster of servers with sufficient processing power, memory, storage space, and network bandwidth to handle the computational load of your model. You may also require specialized hardware, such as GPUs, for accelerated computing. Once you have all the required resources, you can install the necessary software packages and configure them according to your needs.\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> There are several limitations when running models on-premise, such as high upfront costs for hardware and software, limited scalability due to physical constraints, and a lack of flexibility in terms of deployment options. Additionally, maintaining and upgrading the infrastructure can be time-consuming and costly.\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n> There are several trade-offs to consider when deploying models on-premise, such as increased costs, limited scalability, and a lack of flexibility in terms of deployment options. Additionally, maintaining and upgrading the infrastructure can be time-consuming and costly.\n\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\udd27 Training / Fine Tuning Costs\n\n> After a significant effort spanning four days and a substantial expenditure of $800 for GPU rentals from providers such as Lambda Labs and Paperspace, the resulting model, named gpt4all-lora, is now up and running. This sum includes the costs incurred from several unsuccessful training attempts. In addition to the GPU costs, a further $500 was expended on the OpenAI API. \n\nHowever, the training duration has been drastically reduced with the model's release. Now, the entire training process can be completed in approximately eight hours using Lambda Labs' DGX A100 8x 80GB setup, at a significantly reduced total cost of $100.\n\n### \ud83d\udd22 Default Parameters\n\nFor our experiments, we have been using the following parameters:\n\n```python\ntemperature=0.2\ntop_p=0.95\nstop=[]\nmax_tokens=256\nrepeat_penalty=1.1\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\nFor more information about GPT4All performances and quality, you can visit: https://gpt4all.io/index.html.\n\n### \ud83d\ude80 Serving Details\n\nTo expose the service, we currently use FastAPI and [llama-cpp-python](https://abetlen.github.io/llama-cpp-python/) library which is compatible with all ggml models.\n\n```python\nllama-cpp-python==0.1.43\n```\n\n### \u26aa\ufe0f Embeddings\n\nThe current model supports embedding generation too. Another endpoint is exposed for this purpose. You can check out the documentation for each container to see how to use it at `http://{container_ip}:8000/docs` or at our public services [Open API documentation](https://mock.prem.ninja/docs).\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started (using LangChain)\n\n```python\n!pip install langchain\n!pip install openai\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.schema import AIMessage, HumanMessage\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8222/v1\", max_tokens=128)\n\nmessages = \n HumanMessage(content=\"Why do I need to run machine learning models on-premise?\")\n]\n\nchat(messages)\n```\n\n### Speeding Up Inference\n\nBy default, the model uses (1/2 + 1) the number of cores available on the underlying hardware. If you want to use more cores, you can pass an arbitrary number to the request parameter `n_threads`. Theoretically, this method will improve inference time.\n\n```json\n{\n \"model\": \"string\",\n \"messages\": [\n {}\n ],\n \"temperature\": 0.2,\n \"top_p\": 0.95,\n \"n\": 1,\n \"stream\": false,\n \"stop\": [],\n \"max_tokens\": 256,\n \"presence_penalty\": 0,\n \"frequence_penalty\": 0,\n \"logit_bias\": {},\n \"user\": \"\",\n \"n_threads\": 0 // this is the parameter required to change in order use more cpu cores.\n}\n```\n\n## \ud83d\udcdc License\n\nThe model is a research preview intended for **non-commercial use only**, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI.\n", "modelInfo": {"weightsName": "gpt4all-lora-q4.bin", "weightsSize": 4212859520, "maxCtxTokens": 512, "memoryRequirements": 4096, "tokensPerSecond": 6}, "interfaces": ["chat"], "dockerImages": {"cpu": {"size": 8986459988, "image": "ghcr.io/premai-io/chat-gpt4all-lora-q4-cpu:1.0.4"}}, "defaultPort": 8000, "defaultExternalPort": 8222}, {"id": "weaviate", "name": "Weaviate", "description": "Weaviate is an open-source, cloud-native vector database designed to enable machine learning (ML) and artificial intelligence (AI) capabilities for your data. It's built to handle large-scale data storage and search operations, making it a powerful tool for data scientists and developers working with big data [Learn More](https://weaviate.io/developers/weaviate)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\n<a href='https://weaviate.io/' target='_blank'>Weaviate</a> is an open-source, cloud-native vector database designed to enable machine learning (ML) and artificial intelligence (AI) capabilities for your data. It's built to handle large-scale data storage and search operations, making it a powerful tool for data scientists and developers working with big data <a href='https://weaviate.io/developers/weaviate' target='_blank'>Learn more</a> \ud83d\ude80.\n\n## \ud83d\udc47 Getting Started (Implementation)\n\nThe service can be used with Langchain or the official weavaite python client (https://github.com/qdrant/qdrant). Below you can find an example using the service with Langchain. In the code snippet, we are assuming that you are using all-miniLM-l6-v2 model for embeddings generation and the service is running locally on port 8001.\n\n```python\n\n!pip install weaviate-client\n\nimport os\n\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.docstore.document import Document\nfrom langchain.vectorstores import Weaviate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\ndoc1 = Document(page_content=\"Prem is an easy to use open source AI platform. With Prem you can quickly build provacy preserving AI applications.\")\ndoc2 = Document(page_content=\"\"\"\nPrem App\n\nAn intuitive desktop application designed to effortlessly deploy and self-host Open-Source AI models without exposing sensitive data to third-party.\n\n\"\"\")\ndoc3 = Document(page_content=\"\"\"\nPrem Benefits\n\nEffortless Integration\nSeamlessly implement machine learning models with the user-friendly interface of OpenAI's API.\n\nReady for the Real World\nBypass the complexities of inference optimizations. Prem's got you covered.\n\nRapid Iterations, Instant Results\nDevelop, test, and deploy your models in just minutes.\n\nPrivacy Above All\nYour keys, your models. We ensure end-to-end encryption.\n\nComprehensive Documentation\nDive into our rich resources and learn how to make the most of Prem.\n\nPreserve Your Anonymity\nMake payments with Bitcoin and Cryptocurrency. It's a permissionless infrastructure, designed for you.\n\"\"\")\n\n# Using sentence transformers all-MiniLM-L6-v2\nembeddings = OpenAIEmbeddings(openai_api_base=\"http://localhost:8444/v1\")\n\n# Using locally running Weaviate\nurl = \"http://localhost:8080\"\n\nvectorstore = Weaviate.from_documents(\n [doc1, doc2, doc3], \n embeddings, \n weaviate_url=url, \n by_text=False,\n)\n\nquery = \"What are Prem Benefits?\"\ndocs = vectorstore.similarity_search(query)\nprint(docs[0].page_content)\n```", "interfaces": ["vector-store"], "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/vector-store-weaviate/logo.svg", "modelInfo": {}, "envVariables": ["QUERY_DEFAULTS_LIMIT=25", "AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true", "PERSISTENCE_DATA_PATH=/var/lib/weaviate", "DEFAULT_VECTORIZER_MODULE=none", "CLUSTER_HOSTNAME=node1"], "volumePath": "/var/lib/weaviate", "dockerImages": {"cpu": {"size": 46718418, "image": "semitechnologies/weaviate:1.19.11"}}, "defaultPort": 8080, "defaultExternalPort": 8080}, {"id": "whisper-large-v2", "name": "Whisper Large v2", "description": "", "documentation": "# Documentation\n\n## \ud83d\udccc Description\n\nWhisper Tiny is a compact version of OpenAI's Whisper model, designed for automatic speech recognition (ASR) and speech translation. Despite its smaller size, it retains the powerful capabilities of the larger models, making it suitable for applications where computational resources or storage space are limited. <a href='https://huggingface.co/openai/whisper-large-v2' target='_blank'>Learn More</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\nTo run the `whisper-large-v2` service on Prem, you'll just need a GPU with at least 6GiB of RAM.\n\n## \ud83d\udcd2 Example Usage\n\nWhisper Tiny can be used for various tasks, including English to English transcription, French to French transcription, and French to English translation. It can also handle long-form transcription by using a chunking algorithm, allowing it to transcribe audio samples of arbitrary length.\n\n### \ud83c\udfb6 sample.wav. You can find the file [here](https://github.com/premAI-io/prem-registry/blob/main/audio-to-text-whisper-tiny/sample.wav)\n\n<img width=\"1449\" alt=\"image\" src=\"https://github.com/premAI-io/prem-registry/assets/29598954/8fe4ee74-e941-42ae-b3e8-5efd30581729\">\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\ude80 Getting Started with OpenAI Python client\n\nThe service exposes the same endpoints as OpenAI DALL-E does. You can directly use the official `openai` python library.\n\n```python\n\n!pip install openai\n\nimport openai\n\nopenai.api_base = \"http://184.105.5.51:10111/v1\"\nopenai.api_key = \"random-string\"\n\naudio_file = open(\"./sample.wav\", \"rb\")\ntranscript = openai.Audio.transcribe(\"whisper-1\", audio_file)\nprint(transcript)\n\n```\n\n## \ud83d\udcdc License\n\nWhisper's code and model weights are released under the MIT License.\n", "beta": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/audio-to-text-whisper-large-v2/logo.svg", "modelInfo": {}, "interfaces": ["audio-to-text"], "dockerImages": {"gpu": {"size": 23723729305, "image": "ghcr.io/premai-io/audio-to-text-whisper-large-v2-gpu:1.0.1"}}, "defaultPort": 8000, "defaultExternalPort": 10111}, {"id": "dolly-v2-12b", "name": "Dolly v2 12B", "beta": true, "description": "Dolly-v2-12b, developed by Databricks, is an instruction-following large language model trained on diverse capability domains. It exhibits remarkable instruction following behavior, surpassing the foundation model it's based on, Pythia-12b. The model is particularly designed for commercial use and its inference can be run on various GPU configurations.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\n`dolly-v2-12b`, developed by Databricks, is an instruction-following large language model trained on diverse capability domains. It exhibits remarkable instruction following behavior, surpassing the foundation model it's based on, Pythia-12b. The model is particularly designed for commercial use and its inference can be run on various GPU configurations. <a href='https://huggingface.co/databricks/dolly-v2-12b' target='_blank'>Learn More</a>\n\n## \ud83d\udcbb Hardware Requirements\n\n> **Memory requirements**: 23.91GiB GB (24484 MiB).\n\nTo run the `dolly-v2-12b` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA A100, NVIDIA V100\n - When using V100s (e.g., `p3.2xlarge`, 1 x V100 16GB) set `torch_dtype=torch.float16` in `pipeline()` instead. The 12B param model may not function well in 8-bit.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100-32G` or `A100` or higher\n- GPU: NVIDIA A100, NVIDIA V100\n\n### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n- A100 GPUs: A100 GPUs are preferred for training all model sizes, and are the only GPUs that can train the 12B param model in a reasonable amount of time.\n- A10 GPUs: Training the 12B param model is not recommended on A10s.\n- V100 GPUs: When using V100s (e.g., `p3.2xlarge`, 1 x V100 16GB, `NC6s_v3`), in all cases, set `torch_dtype=torch.float16` in `pipeline()` instead. The 12B param model may not function well in 8-bit.\n\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> Running machine learning models on-premise allows you to control the computation, data, and metrics involved in the model lifecycle. This is important because the data and compute used to train a model can be quite large and typically cannot be shared across multiple organizations. Hosting the model on-premise also provides the flexibility to control access to the model and ensure the correct security requirements are met.\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> To run your models on-premise, you will first need to package them into a production-grade model deployment package (MDP). The MDP can be deployed on any cloud provider-compatible platform such as OpenAI Pet Store or AutoML packaging platform. For more information, see Model Deployment.\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> Some of the limitations you might run into are related to compliance, authentication and audit requirements. Depending on the regulations you are following you might be prevented from storing the model in the public cloud. Additionally, you might not be able to migrate the model to another provider or cloud because of non-standardized technologies or missing APIs.\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\nThere are several considerations to take into account when deploying AI models in the enterprise. Depending on the deployment method, some of these considerations might not apply.\n\nIf the models are deployed as software as a service (SAAS) using an optimized cloud platform (e.g., using a dedicated hardware appliance, like Google Prediction API for Edge TPUs), the models deployment might not require any customization and the target platform should be properly dimensioned to sustain the expected workload.\n\nIf the models are deployed as a docker image, it should be properly built, optimized, and tuned for the target platform. In some cases, a cluster of machines with the right hardware and software configuration might be needed to sustain the expected workload.\n\nIf the models are deployed on-premise using a physical machine or a cluster, the models deployment might require customization, depending on the use case. The target platform should be properly dimensioned to sustain the expected workload.\n\nIn some cases, the models might require access to specific hardware or software components, which might not be available on the target platform.\n\nIn some other cases, the deployment method might be restricted by the agreements reached with the providers of the required hardware or software components.\n\n</blockquote>\n\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n\n```python\n!pip install langchain\n!pip install openai\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.schema import AIMessage, HumanMessage\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8333/v1\", max_tokens=128)\n\nmessages = \n HumanMessage(content=\"Why do I need to run machine learning models on-premise?\")\n\nchat(messages)\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\nAccording to the known limitations, `dolly-v2-12b` is not state of the art. It is not designed to out-perform more mordern architectures and in fact underperforms `dolly-v1-6b` in some evaluation benchmarks.\n\nCheck out the <a href='https://github.com/databrickslabs/dolly#known-limitations' target='_blank'>other limitations</a>.\n\n## \ud83d\udcdc License\n\n`dolly-v2-12b` is a 12 billion parameter causal language model created by <a href='https://databricks.com/' target='_blank'>Databricks</a> that is derived from <a href='https://www.eleuther.ai/' target='_blank'>EleutherAI</a>\u2019s <a href='https://huggingface.co/EleutherAI/pythia-12b' target='_blank'>Pythia-12b</a> and fine-tuned on a <a href='https://github.com/databrickslabs/dolly/tree/master/data' target='_blank'>~15K record instruction corpus</a> generated by Databricks employees and released under a permissive license (CC-BY-SA).", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-dolly-v2-12b/logo.svg", "modelInfo": {"memoryRequirements": 24484, "tokensPerSecond": 19}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 40689261892, "image": "ghcr.io/premai-io/chat-dolly-v2-12b-gpu:1.0.3"}}, "defaultPort": 8000, "defaultExternalPort": 8333}, {"id": "stable-diffusion-1-5", "name": "Stable Diffusion 1.5", "description": "Stable Diffusion v1.5 is a sophisticated text-to-image diffusion model capable of generating high-quality images from textual prompts. Developed by Robin Rombach and Patrick Esser, this model is a significant upgrade from its predecessor, Stable Diffusion v1.2, having been fine-tuned on 595k steps at a resolution of 512x512 on `laion-aesthetics v2 5+` with a 10% drop in text-conditioning to enhance classifier-free guidance sampling.", "documentation": "# Documentation\n\n## \ud83d\udccc Description\n\nStable Diffusion v1.5 is a sophisticated text-to-image diffusion model capable of generating high-quality images from textual prompts. Developed by Robin Rombach and Patrick Esser, this model is a significant upgrade from its predecessor, Stable Diffusion v1.2, having been fine-tuned on 595k steps at a resolution of 512x512 on `laion-aesthetics v2 5+` with a 10% drop in text-conditioning to enhance classifier-free guidance sampling. <a href='https://github.com/runwayml/stable-diffusion' target='_blank'>Learn More</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\nTo run the `stable-diffusion-1-5` service on Prem, you'll need access to a GPU with at least 16GiB of RAM.\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\n\n![WS_USl7I](https://github.com/premAI-io/prem-registry/assets/29598954/7c31ed10-620b-445c-a23d-c34e0fa92b43)\n\n### 2\ufe0f\u20e3 Prompt: Low polygon panda 3d\n\n![9rHScaSw](https://github.com/premAI-io/prem-registry/assets/29598954/bafa9c5e-02dd-4a76-8c69-d739e508ad2d)\n\n### 3\ufe0f\u20e3 Prompt: 3d hiper-realistic rick sanchez and morty\n\n![PKVb4jfl](https://github.com/premAI-io/prem-registry/assets/29598954/04223540-b736-4952-9aa4-87e08759cd7d)\n\n### 4\ufe0f\u20e3 Prompt: Synthwave brad pitt wearing headphones, animated, trending on artstation, portrait\n\n![35pvt7Y9](https://github.com/premAI-io/prem-registry/assets/29598954/cd49a0c4-ec50-44a4-836a-7ea4964b361e)\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\ude80 Getting Started with OpenAI Python client\n\nThe service exposes the same endpoints as OpenAI DALL-E does. You can directly use the official `openai` python library.\n\n```python\n\n!pip install openai\n!pip install pillow\n\nimport io\nimport base64\nimport openai\n\nfrom PIL import Image\n\nopenai.api_base = \"http://localhost:9111/v1\"\nopenai.api_key = \"random-string\"\n\nresponse = openai.Image.create(\n prompt=\"Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\",\n n=1,\n size=\"512x512\"\n)\n\nimage_string = response[\"data\"][0][\"b64_json\"]\n\nimg = Image.open(io.BytesIO(base64.decodebytes(bytes(image_string, \"utf-8\"))))\nimg.save(\"iron_man.jpeg\")\n\n```\n\n## \ud83d\udcdc License\n\nThe model is under CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.\n", "beta": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/diffuser-stable-diffusion-1-5/logo.svg", "modelInfo": {"memoryRequirements": 14705, "secondsPerImage": 8}, "interfaces": ["diffuser"], "dockerImages": {"gpu": {"size": 26962657735, "image": "ghcr.io/premai-io/diffuser-stable-diffusion-1-5-gpu:1.0.2"}}, "defaultPort": 8000, "defaultExternalPort": 9222}, {"id": "codet5p-220m-py", "name": "Saleforce 5P 220M Python", "description": "", "documentation": "# Documentation", "comingSoon": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/coder-codet5p-220m-py/logo.svg", "modelInfo": {}, "interfaces": ["coder"], "dockerImages": {"cpu": {"size": 5201544171, "image": "ghcr.io/premai-io/coder-codet5p-220m-py-cpu:0.0.5"}}, "defaultPort": 8000, "defaultExternalPort": 10111}, {"id": "mpt-7b-instruct", "name": "MPT 7B Instruct", "beta": true, "description": "MPT-7B-Instruct is a model for short-form instruction following. It's built by finetuning MPT-7B on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets. Its inference can be run on various GPU configurations. It's released under cc-by-sa-3.0 which enables commercial usage with proper credits.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\nMPT-7B-Instruct is a model for short-form instruction following. It is built by finetuning <a href='https://huggingface.co/mosaicml/mpt-7b' target='_blank'>MPT-7B</a> on a <a href='https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf' target='_blank'>dataset</a> derived from the <a href='https://huggingface.co/datasets/databricks/databricks-dolly-15k' target='_blank'>Databricks Dolly-15k</a> and the <a href='https://huggingface.co/datasets/Anthropic/hh-rlhf' target='_blank'>Anthropic Helpful and Harmless (HH-RLHF)</a> datasets.\n\n## \ud83d\udcbb Hardware Requirements\n> **Memory requirements**: 14.1 GB (13455 MiB).\n\nTo run the `mpt-7b-instruct` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100` or higher\n- GPU: NVIDIA V100 or higher.\n\n### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n\n- V100 GPUs: The model can be run on V100 GPUs.\n\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> \\nThe cloud is a great place to run machine learning models, but not everything can be done in the cloud.\\nIn this post, I\u2019ll explain why you might need to run machine learning models on-premise, and how you can do it.\\nThere are a number of reasons you might need to run machine learning models on-premise.\\nThe first is security. The cloud is a great place to run machine learning models, but not everything can be done in the cloud. For example, you might have sensitive data that you don\u2019t\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> \\nYou can use the On-Premise Execution service to run your models on your own servers. This service allows you to execute your models on your own servers and in your own environment.\\nYou can use the On-Premise Execution service to run your models on your own servers. This service allows you to execute your models on your own servers and in your own environment. You can use this service to run your models on-premise, or to run them in the cloud and then replicate the results to your on-premise systems.\\nTo use the On\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> \\nThere are a few limitations to running your models on-premise.\\nThe first limitation is the amount of data that can be processed. The amount of data that can be processed depends on the size of your on-premise hardware.\\nThe second limitation is the time it takes to process the data. The time it takes to process the data depends on the amount of data and the model\u2019s complexity.\\nThe third limitation is the speed of your internet connection. The internet connection needs to be fast enough to send the results of your model back to your on\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\n\\nDeploying a model on-premise allows you to have more control over your data and processes, but it can be more expensive and time-consuming to set up. If you\u2019re considering an on-premise deployment, it\u2019s important to understand the trade-offs and plan accordingly.\\nIf you\u2019re considering an on-premise deployment, it\u2019s important to understand the trade-offs and plan accordingly.The best way to learn how to use a new technology is to try it out.The best way to learn how to use a new technology is to try it out.\\nIn the world of data science, the term \u201cdata lake\u201d refers to a large repository of raw data that is stored in a single location.\\nIn the world of data science, the term \u201cdata lake\u201d refers to a large repository of raw data that is stored in a single location. Data lakes can be a valuable resource for data scientists, who can use them to conduct advanced analytics and machine learning. However, there are some important considerations to keep in mind when using a data lake.\\nOne of the most important considerations is security. Data lakes\n\n</blockquote>\n\nIt's visible from the above outputs that model doesn't answer in question answer mode by default and it's really bad at knowing when to stop. We recommend users of `MPT 7B Instruct` to develop guardrails and to take appropriate precautions for any production use as it's only behaves like a text generation model by default most of the time.\n\nAn example would be using a Chat Prompt Template as shown below but it doesn't work well all the time even with it:\n\nPrompt:\n```\nYou are an AI assistant in a conversational setting.\nProvide a concise and accurate conversational answer to anything User asks.\n===================\n\nUser: What are the trade-offs of deploying models on-premise I should be aware of?\nAssistant:\"\"\"\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8452/v1\", max_tokens=128)\nmessages = [HumanMessage(content=\"Why do I need to run machine learning models on-premise?\")]\nprint(chat(messages))\n\n# output:\n# \\nWhen it comes to machine learning, it\u2019s important to understand that there are two types of models: supervised and unsupervised.\\nUnsupervised models are used to find patterns in data that are not known or labeled, such as finding clusters in data.\\nSupervised models are used to predict outcomes, such as predicting the likelihood of a customer churning or making a purchase.\\nTo train a supervised model, you need labeled data. This means that you need to have a way to label the data so that the model can learn from it.\\nIn\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nYou are an AI assistant in a conversational setting.\nProvide a conversational answer to any question an User asks. Be original, concise, accurate and helpful.\n===================\n\nUser: {user_message}\nAssistant:\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Why do I need to run machine learning models on-premise?\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8452/v1\", max_tokens=128)\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n\n# output: > Finished chain.\n# Machine learning models are usually trained on large amounts of data and it can be challenging to get all the data required for training in a single location. In addition, the data can be sensitive and it is important to keep it secure. Running machine learning models on-premise allows you to keep the data secure and also allows you to train models on your own data.\\n#\n\n```\n---\n##### This model was trained on data formatted in the dolly-15k format:\n\n```python\nINSTRUCTION_KEY = \"### Instruction:\"\nRESPONSE_KEY = \"### Response:\"\nINTRO_BLURB = \"Below is an instruction that describes a task. Write a response that appropriately completes the request.\"\nPROMPT_FOR_GENERATION_FORMAT = \"\"\"{intro}\n{instruction_key}\n{instruction}\n{response_key}\n\"\"\".format(\n intro=INTRO_BLURB,\n instruction_key=INSTRUCTION_KEY,\n instruction=\"{instruction}\",\n response_key=RESPONSE_KEY,\n)\n\nexample = \"James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week? Explain before answering.\"\nfmt_ex = PROMPT_FOR_GENERATION_FORMAT.format(instruction=example)\n# fmt_ex is ready to be tokenized and sent through the model.\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\n\n### \ud83d\udeab Limitations and Biases\nMPT-7B-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate information. MPT-7B-Instruct was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.\n\nWe recommend users of MPT-7B-Instruct to develop guardrails, use it with above prompt templates and to take appropriate precautions while using it.\n\n\n## \ud83d\udcdc License\nIt's released under CC-By-SA-3.0 which enables commercial usage with proper credits given.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-mpt-7b-instruct/logo.svg", "modelInfo": {"memoryRequirements": 13455, "tokensPerSecond": 40}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 33040731339, "image": "ghcr.io/premai-io/mpt-7b-instruct-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8452}, {"id": "replit-v1-3b", "name": "Replit v1 3B", "description": "", "documentation": "# Documentation", "comingSoon": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/coder-replit-v1-3b/logo.svg", "modelInfo": {}, "interfaces": ["coder"], "dockerImages": {"gpu": {"size": 27258256302, "image": "ghcr.io/premai-io/coder-replit-code-v1-3b-gpu:0.0.2"}}, "defaultPort": 8000, "defaultExternalPort": 10111}, {"id": "vicuna-7b-q4", "name": "Vicuna 7B Q4", "description": "Vicuna is an open-source chatbot, fine-tuned on LLaMA using conversations from ShareGPT. Developed collaboratively by a team from UC Berkeley, CMU, Stanford, and UC San Diego, this auto-regressive language model aims to advance research in large language models and chatbots. Primarily designed for researchers and hobbyists in AI and NLP, the model undergoes preliminary evaluations using diverse questions and GPT-4 for output judgment. The current model has been 4-bit quantized using ggml framework.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nVicuna is an open-source chatbot, fine-tuned on LLaMA using conversations from ShareGPT. Developed collaboratively by a team from UC Berkeley, CMU, Stanford, and UC San Diego, this auto-regressive language model aims to advance research in large language models and chatbots. Primarily designed for researchers and hobbyists in AI and NLP, the model undergoes preliminary evaluations using diverse questions and GPT-4 for output judgment. The current model has been 4-bit quantized using ggml framework <a href='https://vicuna.lmsys.org/' target='_blank'>Learn More</a>.\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Can you explain what is a large language model?\n\n```markdown\nSure! A large language model is a class of machine learning models that are trained on very large amounts of text data. The goal of training these models is to create a representation of human language that can be used for various natural language processing tasks, such as text generation, question- Answering and sentiment analysis. These models typically consist of several layers of neural networks, with each layer representing a different aspect of language (e.g., word meaning, syntax, context). As the model gets larger, it has the ability to learn more abstract representations of language that are more representative of human language understanding.\n```\n\n### 2\ufe0f\u20e3 Prompt: Can you give some examples applications?\n\n```markdown\nLarge language models have a wide range of applications in natural language processing, here are a few examples:\n\n1. Text generation: Large language models can be used to generate text that sounds like it was written by a human, such as chatbot responses or summaries of long articles.\n2. Question-Answering: Large language models can be trained to answer questions based on the information in a large dataset of text. This is often used in virtual assistants and search engines.\n3. Sentiment analysis: Large language models can be used to understand the sentiment behind a piece of text, such as a customer review or social media post.\n4. Translation: Large language models can be trained to translate text from one language to another, using large amounts of training data from both languages.\n5. Dialogue System: Large language model can help in building dialogue system which can have human-like conversations with users.\n6. Language Modeling: Large language models are used for language modeling task where they are able to mimic the way humans produce and process natural language.\n```\n\n### 3\ufe0f\u20e3 Prompt: Are there any limitations?\n\n```markdown\nWhile large language models have many benefits, there are also some limitations that should be considered:\n\n1. Large datasets required: Training a large language model requires access to vast amounts of text data, which can be difficult or expensive to obtain.\n2. Computational resources required: Training and using large language models requires significant computational resources, which may not always be available.\n3. Privacy concerns: Training large language models on personal data can raise privacy concerns. It is important to ensure that the data used for training is handled responsibly and ethically.\n```\n\n### 4\ufe0f\u20e3 Prompt: Summarize the above in two sentences.\n\n```markdown\nLarge language models have many benefits, but also require access to vast amounts of text data, computational resources, and should be trained with responsible and ethical handling of personal data.\n```\n\n### 5\ufe0f\u20e3 Prompt: Write me a story about a superstar.\n\n```markdown\nOnce upon a time, there was a young man named Jack who dreamed of becoming a superstar. Despite facing numerous obstacles and setbacks, he never gave up on his goal.\n\nOne day, an opportunity came knocking at his door. A record label scout approached him with a contract, recognizing his talent and potential. Jack was overjoyed and eagerly signed the deal.\n\nAs his career began to take off, Jack found himself traveling all around the world on tour. He met many other talented musicians and artists along the way, but none as kind and generous as his\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\udd22 Default Parameters\n\nFor our experiments, we have been using the following parameters:\n\n```python\ntemperature=0.2\ntop_p=0.95\nstop=[]\nmax_tokens=256\nrepeat_penalty=1.1\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\nFor more information concerning Vicuna evaluation method you can refer to <a href='https://github.com/lm-sys/FastChat/tree/main/fastchat/eval' target='_blank'></a>.\n\n### \ud83d\ude80 Serving Details\n\nIn order to expose the service we are currently using FastAPI and llama-cpp-python library <a href='https://abetlen.github.io/llama-cpp-python/' target='_blank'></a> which is compatible with all ggml models.\n\n```python\nllama-cpp-python==0.1.43\n```\n\n### \u26aa\ufe0f Embeddings\n\nThe current model supports Embeddings generation too. Another endpoint is exposed for this purpose. You can check out the documentation for each container to see how to use it at `http://IP:PORT/docs` or at our public services Open API doc at <a href='https://mock.prem.ninja/docs' target='_blank'></a>\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started (using LangChain)\n\n```python\n!pip install langchain\n!pip install openai\n\nimport os\n\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.schema import AIMessage, HumanMessage\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8111/v1\", max_tokens=128)\n\nmessages = [\n HumanMessage(content=\"Can you explain what is a large language model?\")\n]\nchat(messages)\n```\n\n### Speeding Up Inference\n\nBy default, the model uses (1/2 + 1) the number of cores available on the underlying hardware. If you want to use more cores, you can pass an arbitrary number to the request parameter `n_threads`. Theoretically, this method will improve inference time.\n\n```json\n{\n \"model\": \"string\",\n \"messages\": [\n {}\n ],\n \"temperature\": 0.2,\n \"top_p\": 0.95,\n \"n\": 1,\n \"stream\": false,\n \"stop\": [],\n \"max_tokens\": 256,\n \"presence_penalty\": 0,\n \"frequence_penalty\": 0,\n \"logit_bias\": {},\n \"user\": \"\",\n \"n_threads\": 0 // this is the parameter required to change in order use more cpu cores.\n}\n```\n\n## \ud83d\udcdc License\n\nThe model is a research preview intended for non-commercial use only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. \n", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-vicuna-7b-q4/logo.svg", "modelInfo": {"weightsName": "vicuna-7b-q4.bin", "weightsSize": 4212859520, "maxCtxTokens": 512, "memoryRequirements": 4096, "tokensPerSecond": 7}, "interfaces": ["chat"], "dockerImages": {"cpu": {"size": 8986449745, "image": "ghcr.io/premai-io/chat-vicuna-7b-q4-cpu:1.0.4"}}, "defaultPort": 8000, "defaultExternalPort": 8111}, {"id": "falcon-7b-instruct", "name": "Falcon 7B Instruct", "beta": true, "description": "Falcon 7B Instruct, developed by TII, is a 7B parameters causal decoder-only model trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. The model is particularly designed for commercial use and its inference can be run on various GPU configurations.", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nFalcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. The model is particularly designed for commercial use and its inference can be run on various GPU configurations. <a href='https://huggingface.co/tiiuae/falcon-7b-instruct' target='_blank'>Learn More</a>\n\n## \ud83d\udcbb Hardware Requirements\n\n> **Memory requirements**: 15.81 GB (15085 MiB).\n\nTo run the `falcon-7b-instruct` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA A100, NVIDIA V100\n\nIf you are using Paperspace:\n\n- Instance Type: `V100` or higher\n- GPU: NVIDIA A100, NVIDIA V100\n\n### On-Premise Platforms\n\nYou'll need access to a GPU with the following options:\n- A100 GPUs: A100 GPUs are preferred for training all model sizes.\n- V100 GPUs: The model can be run on V100 GPUs.\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Why do I need to run machine learning models on-premise?\n\n> \\nRunning machine learning models on-premise allows organizations to have more control over their data and computations. It also allows for the processing of large amounts of data in a high-performance environment, which can speed up the training and inference of models. Additionally, on-premise computing allows for integration with other applications and services.The main benefit of running a machine learning model on-premise is the ability to have more control over the data and computations, as well as the ability to process large amounts of data in a high-performance environment.\n\n\n### 2\ufe0f\u20e3 Prompt: How do I run my models on-premise?\n\n> \\nTo run models on-premise, you will need to ensure that your data and models are properly stored and secured. This can be done through various means, such as storing your data on your own cloud storage solution or using a public cloud platform. Additionally, you will need to ensure that you have appropriate permissions and access to your data and models. It is recommended to consult with your IT department or a data modeling expert to determine the best approach for running your models on-premise.The main problem that many companies face is the lack of scalability in their current infrastructure.\n\n### 3\ufe0f\u20e3 Prompt: What are the limitations of running my models on-premise?\n\n> \\nThere are several limitations of running your models on-premise, such as hardware and software limitations, performance issues, network latency, security issues, and licensing costs. Additionally, running models on-premise may require significant infrastructure resources and maintenance, which can be costly.Powered by TradeKingThe best way to test a trading strategy is to use the demo environment provided by the trading platform.I also need to know how to set up the trading platform on my computer in order to use it for trading.\\n- Can I use the trading platform on\n\n\n### 4\ufe0f\u20e3 Prompt: What are the trade-offs of deploying models on-premise I should be aware of?\n\n<blockquote>\n\n \\nDeploying models on-premise can offer several benefits, including improved latency, security, and data privacy. However, there are also trade-offs to consider, such as increased complexity, cost, and scalability. Additionally, on-premise models may require more resources for scaling and managing data, which can increase costs over time. It is important to weigh the costs and benefits of on-premise vs. cloud-based deployment to determine the best fit for your specific needs.I hope this helps!-EI\\nWhat are some best practices for ensuring security when deploying models on-premise?\\nAs models are deployed on-premise, it is important to ensure security measures to protect sensitive data and infrastructure. Some best practices include encrypting data transmissions, using role-based access controls, and implementing firewalls and intrusion detection systems. Additionally, regular security audits and evaluations should be conducted to ensure continued security measures. -EIThis is a great resource on best practices for deploying models on-premise. -EIThe following article offers a more in-depth guide to securing your on-premise deployment: \n\n</blockquote>\n\nIt's visible above from the outputs that model sometimes generates gibberish at the end and tends to hallucinate. We recommend users of Falcon-7B-Instruct to develop guardrails and to take appropriate precautions for any production use as it's only a text generation model by default.\n\nAn example would be using a Chat Prompt Template as shown below:\n\nPrompt:\n```\nYou are an AI assistant in a conversational setting.\nProvide a concise and accurate conversational answer to anything User asks.\n===================\n\nUser: What are the trade-offs of deploying models on-premise I should be aware of?\nAssistant:\"\"\"\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8448/v1\", max_tokens=128)\nmessages = [HumanMessage(content=\"What are the trade-offs of deploying models on-premise I should be aware of?\")]\nprint(chat(messages))\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nYou are an AI assistant in a conversational setting.\nProvide a conversational answer to any question an User asks. Be original, concise, accurate and helpful.\n===================\n\nUser: {user_message}\nAssistant:\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Why do I need to run machine learning models on-premise?\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8448/v1\", max_tokens=128)\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n```\n\n### \ud83d\udd0e Quality Benchmarks\n\nIt outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.)\n\nBase model Falcon-7B is trained on English and French data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.\n\n### \ud83d\udeab Limitations and Biases\n\nWe have noticed that the model sometimes generates responses that are not relevant and mostly gibberish like letters and numbers or just repeating the same words. E.g - xjskdafhnwne$. Also we found that it generates hashtag like words like #falcon7b, #falcon7binstruct etc, which seems to be a bias coming from the finetuning data as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.\n\nWe recommend users of Falcon-7B-Instruct to develop guardrails and to take appropriate precautions for any production use.\n\n\n## \ud83d\udcdc License\nFalcon-7B was trained on 1,500B tokens of <a href='https://huggingface.co/datasets/tiiuae/falcon-refinedweb' target='_blank'>RefinedWeb</a>, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by <a href='https://arxiv.org/abs/2101.00027' target='_blank'>The Pile (Gao et al., 2020).</a>\nIt is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-falcon-7b-instruct/logo.svg", "modelInfo": {"memoryRequirements": 15085, "tokensPerSecond": 24}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 34178471850, "image": "ghcr.io/premai-io/chat-falcon-7b-instruct-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8448}, {"id": "bark", "name": "Bark", "description": "", "documentation": "# Documentation", "comingSoon": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/text-to-audio-bark/logo.svg", "modelInfo": {}, "interfaces": ["text-to-audio"], "dockerImages": {"cpu": {"size": 18381415182, "image": "ghcr.io/premai-io/text-to-audio-bark-cpu:0.0.5"}, "gpu": {"size": 33835863375, "image": "ghcr.io/premai-io/text-to-audio-bark-gpu:0.0.2"}}, "defaultPort": 8000, "defaultExternalPort": 10111}, {"id": "gorilla-falcon-7b", "name": "Gorilla Falcon 7B", "beta": true, "description": "Gorilla Falcon 7B is an open-source API caller trained by fine-tuning Falcon weights. The model is particularly designed for commercial use (License: Apache-2.0)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\nGorilla Falcon 7B is an open-source API caller trained by fine-tuning Falcon weights. Given a natural language query, Gorilla can write a semantically- and syntactically- correct API to invoke. \n\n\n\n## \ud83d\udcbb Hardware Requirements\n\n**Memory requirements**: 10.6 GB (10874 MiB).\n\n\nTo run the `gorilla-falcon-7b` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100-32G` or higher\n- GPU: NVIDIA V100 or higher.\n\n### On-Premise Platforms\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Translate a text from English to French\n\n\\<\\<\\<domain\\>\\>\\>: Natural Language Processing Text2Text Generation\n\n\\<\\<\\<api_call\\>\\>\\>: T5ForConditionalGeneration.from_pretrained('google/byt5-small')\n\n\\<\\<\\<api_provider\\>\\>\\>: Hugging Face Transformers\n\n1. Import the necessary classes from the transformers package, which includes T5Tokenizer and T5ForConditionalGeneration.\n2. Load the pre-trained model 'google/byt5-small' using T5ForConditionalGeneration.from_pretrained() method. This model is specifically designed for text-to-text generation tasks.\n3. Load the tokenizer associated with the model using T5Tokenizer.from_pretrained('google/byt5-small') method.\n4. Prepare the input text by adding the prefix 'translate English to French:'before the text you want to translate.\n5. Tokenize the input text using the tokenizer and generate the output text using the model.\n6. Decode the output tokens to obtain the translated French text.\n\n### 2\ufe0f\u20e3 Prompt: Generate an image from a prompt\n\n\\<\\<\\<domain\\>\\>\\>: Multimodal Text-to-Image\n\n\\<\\<\\<api_call\\>\\>\\>: StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4')\n\n\\<\\<\\<api_provider\\>\\>\\>: Hugging Face\n1. Import the StableDiffusionPipeline class from the diffusers package and Image class from PIL.\n2. Load the pretrained StableDiffusionPipeline model with the identifier 'CompVis/stable-diffusion-v1-4'.\n3. Provide the text prompt describing the image you want to generate. For example, 'a beautiful landscape with a mountain range and a river'.\n4. Use the pipeline to generate an image based on the provided prompt. The generated image will be stored in the variable 'image'. You can then save the image to a file if needed.\n\n### 3\ufe0f\u20e3 Prompt: Upscale an image\n\n\\<\\<\\<domain\\>\\>\\>: Computer Vision Image-to-Image\n\n\\<\\<\\<api_call\\>\\>\\>: pipeline('image-super-resolution', model='caidas/swin2SR-classical-sr-x4-64')\n\n\\<\\<\\<api_provider\\>\\>\\>: Hugging Face Transformers\n1. Import the necessary libraries, which include the 'pipeline' function from transformers and the 'Image' class from PIL.\n2. Use the 'pipeline' function to create an instance of the 'image-super-resolution' model with the pre-trained 'caidas/ swin2SR-classical-sr-x4-64' model. This model is designed to upscale images by a factor of 4.\n3. Load the input image into the 'Image' class and pass it to the model for upscaling.\n4. The upscaled image will be saved in the same file with the prefix 'upscaled_'.\n\n\n### 4\ufe0f\u20e3 Prompt: Our customer is a zoo and we want to help them detect movement of different animals. Write a python program in 1 to 2 lines to call API in TensorFlowHub.\n\n```python\n# Import required libraries 'AutoImageProcessor' and 'PredictionLayer' from transformers, 'Image' from PIL, and'requests'. Load the animal detection image from a URL.\nprocessor = AutoImageProcessor.from_pretrained('hf-tiny-model-private/tiny-random-CLIPSegModel')\nmodel = PredictionLayer.from_pretrained('hf-tiny-model-private/tiny-random-CLIPSegModel')\nimage_url = 'https://example.com/animal_image.jpg' # Replace this with the URL of the image you want to analyze.\nimage = Image.open(requests.get(image_url, stream=True).raw) # Read the image from the URL and convert it to PIL Image format.\ninputs = processor(images=image, return_tensors='pt') # Prepare inputs for the model by processing the image and converting it to the required format.\noutputs = model(**inputs) # Apply the model to the inputs and obtain the output.\n# Analyze output for animal detections and take appropriate action.\n```\n\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8000/v1\")\nmessages = [HumanMessage(content=\"Generate an image from a text\")]\nprint(chat(messages))\n\n# output:\n# <<<domain>>>: Multimodal Text-to-Image Generation\n# <<<api_call>>>: StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4', vae='AutoencoderKL.from_pretrained(stabilityai/sd-vae-ft-mse)')\n# <<<api_provider>>>: Hugging Face\n# <<<explanation>>>:1. Import the necessary libraries: AutoencoderKL from diffusers.models and StableDiffusionPipeline from diffusers.\n# 2. Load the 'CompVis/stable-diffusion-v1-4' model and the'stabilityai/sd-vae-ft-mse' VAE model. The VAE model will be used for text encoding.\n# 3. Create a StableDiffusionPipeline instance by calling the from_pretrained method with the model and VAE as arguments.\n# 4. Provide a text prompt describing the desired image, and use the pipeline to generate an image based on the text prompt. Save the generated image to a file.\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nUser: {user_message}\nAssistant: \n\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Generate an image from a text\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8000/v1\")\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n\n# output: Finished chain.\n# <<<domain>>>: Multimodal Text-to-Image Generation\n# <<<api_call>>>: StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4', vae='AutoencoderKL.from_pretrained(stabilityai/sd-vae-ft-mse)')\n# <<<api_provider>>>: Hugging Face\n# <<<explanation>>>:1. Import the necessary libraries: AutoencoderKL from diffusers.models and StableDiffusionPipeline from diffusers.\n# 2. Load the 'CompVis/stable-diffusion-v1-4' model and the'stabilityai/sd-vae-ft-mse' VAE model. The VAE model will be used for text encoding.\n# 3. Create a StableDiffusionPipeline instance by calling the from_pretrained method with the model and VAE as arguments.\n# 4. Provide a text prompt describing the desired image, and use the pipeline to generate an image based on the text prompt. Save the generated image to a file.\n\n```\n\n### \ud83d\udeab Limitations and Biases\n\nWe have noticed that the model sometimes generates responses with reference to some random unexisting model name on Huggingface. Furthermore also the structure of the output is not always the same (refer to above prompt examples). \nWe recommend users of Gorilla models to develop guardrails and to take appropriate precautions for any production use.\n\nThe creators of Gorilla Falcon 7B have mentioned that despite their effort in addressing the risks of hallucinations like other LLMs, Gorilla models are not free from such limitations. We hope our open-sourced codebase will help other researchers better understand these challenges and improve on these key limitations for making AI beneficial for everyone.\n\n\n## \ud83d\udcdc License\nIt is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-gorilla-falcon-7b/logo.svg", "modelInfo": {"memoryRequirements": 10874, "tokensPerSecond": 22}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 31291459579, "image": "ghcr.io/premai-io/chat-gorilla-falcon-7b-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8756}, {"id": "all-minilm-l6-v2", "name": "All MiniLM L6 v2", "description": "All-MiniLM-L6-v2 is a sentence-transformers model designed to map sentences and paragraphs to a 384-dimensional dense vector space, ideal for clustering or semantic search tasks. Developed during Hugging Face's Community week, this model is fine-tuned on a 1B sentence pairs dataset with a contrastive learning objective. It excels in encoding short texts, capturing semantic information, and is useful for information retrieval, clustering, or sentence similarity tasks. [Learn More](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nAll-MiniLM-L6-v2 is a sentence-transformers model designed to map sentences and paragraphs to a 384-dimensional dense vector space, ideal for clustering or semantic search tasks. Developed during <a href='https://huggingface.co/' target='_blank'>Hugging Face</a>'s Community week, this model is fine-tuned on a 1B sentence pairs dataset with a contrastive learning objective. It excels in encoding short texts, capturing semantic information, and is useful for information retrieval, clustering, or sentence similarity tasks. <a href='https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2' target='_blank'>Learn more</a> \ud83d\ude80.\n\n## \ud83d\udc47 Getting Started\n\nThe service is compatible with \ud83e\udd9c\ud83d\udd17<a href='https://github.com/hwchase17/langchain' target='_blank'>LangChain</a> and follows OpenAI <a href='https://platform.openai.com/docs/api-reference' target='_blank'>API request-response</a> format. If you haven't already, you will need to install :\n\n* `langchain` \u27a1\ufe0f <a href='https://pypi.org/project/langchain/' target='_blank'>pip install</a>.\n* `tiktoken` \u27a1\ufe0f <a href='https://pypi.org/project/tiktoken/' target='_blank'>pip install</a>.\n* `openai` \u27a1\ufe0f <a href='https://pypi.org/project/openai/' target='_blank'>pip install</a>.\n\n## \u2692\ufe0f Usage\n\n\ud83d\udc49 Find an example for using the service with \ud83e\udd9c\ud83d\udd17 LangChain below:\n\n```python\nimport os\n\nfrom langchain.embeddings import OpenAIEmbeddings\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nembeddings = OpenAIEmbeddings(openai_api_base=\"http://localhost:8444/v1\")\ntext = \"Prem is an easy to use open source AI platform.\"\nquery_result = embeddings.embed_query(text)\ndoc_result = embeddings.embed_documents([text])\n```\n\n\nAlso check the official sentence transformers <a href='https://www.sbert.net/' target='_blank'>documentation</a>. It provides extensive examples and detailed information for using the model.\n\n## \ud83d\udc40 Intended Uses\nThe model is meant to be used as an encoder for single sentences and short paragraphs. Given an input text, it outputs a vector that captures the semantic information. You can use the sentence vector generated for information retrieval, clustering,\u00a0or sentence similarity tasks.\n\nBy default, input text longer than 256-word\u00a0pieces is truncated.\n\n## \ud83d\udd0e Evaluation Results\nFor an automated evaluation of this model, see the Sentence Embeddings Benchmark <a href='https://seb.sbert.net' target='_blank'>page</a>.\n\n## \u2696\ufe0f License\n\nThe model is published under <a href='https://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License 2.0</a>.\n", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/embeddings-all-minilm-l6-v2/logo.svg", "modelInfo": {"memoryRequirements": 8192}, "dockerImages": {"cpu": {"size": 1238261448, "image": "ghcr.io/premai-io/embeddings-all-minilm-l6-v2-cpu:1.0.2"}, "gpu": {"size": 20687072077, "image": "ghcr.io/premai-io/embeddings-all-minilm-l6-v2-gpu:1.0.2"}}, "interfaces": ["embeddings"], "defaultPort": 8000, "defaultExternalPort": 8444}, {"id": "whisper-tiny", "name": "Whisper Tiny", "description": "", "documentation": "# Documentation\n\n## \ud83d\udccc Description\n\nWhisper Tiny is a compact version of OpenAI's Whisper model, designed for automatic speech recognition (ASR) and speech translation. Despite its smaller size, it retains the powerful capabilities of the larger models, making it suitable for applications where computational resources or storage space are limited. <a href='https://huggingface.co/openai/whisper-tiny' target='_blank'>Learn More</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\nTo run the `whisper-tiny` service on Prem, you'll just need a CPU with at least 4GiB of RAM.\n\n## \ud83d\udcd2 Example Usage\n\nWhisper Tiny can be used for various tasks, including English to English transcription, French to French transcription, and French to English translation. It can also handle long-form transcription by using a chunking algorithm, allowing it to transcribe audio samples of arbitrary length.\n\n### \ud83c\udfb6 sample.wav. You can find the file [here](https://github.com/premAI-io/prem-registry/blob/main/audio-to-text-whisper-tiny/sample.wav)\n\n<img width=\"1449\" alt=\"image\" src=\"https://github.com/premAI-io/prem-registry/assets/29598954/4b879d6b-4404-47ae-b3c9-f2f5fd38ec0e\">\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\ude80 Getting Started with OpenAI Python client\n\nThe service exposes the same endpoints as OpenAI DALL-E does. You can directly use the official `openai` python library.\n\n```python\n\n!pip install openai\n\nimport openai\n\nopenai.api_base = \"http://184.105.5.51:10111/v1\"\nopenai.api_key = \"random-string\"\n\naudio_file = open(\"./sample.wav\", \"rb\")\ntranscript = openai.Audio.transcribe(\"whisper-1\", audio_file)\nprint(transcript)\n\n```\n\n## \ud83d\udcdc License\n\nWhisper's code and model weights are released under the MIT License.\n", "beta": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/audio-to-text-whisper-tiny/logo.svg", "modelInfo": {}, "interfaces": ["audio-to-text"], "dockerImages": {"cpu": {"size": 5565095283, "image": "ghcr.io/premai-io/audio-to-text-whisper-tiny-cpu:1.0.1"}}, "defaultPort": 8000, "defaultExternalPort": 10111}, {"id": "gorilla-mpt-7b", "name": "Gorilla MPT 7B", "beta": true, "description": "Gorilla mpt 7B is an open-source API caller trained by fine-tuning MPT weights. The model is particularly designed for commercial use (License: Apache-2.0)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\nGorilla MPT 7B is an open-source API caller trained by fine-tuning MPT weights. Given a natural language query, Gorilla can write a semantically- and syntactically- correct API to invoke. \n\n\n\n## \ud83d\udcbb Hardware Requirements\n\n\\> **Memory requirements**: 10.4 GB (10609 MiB).\n\n\nTo run the `gorilla-mpt-7b` service, you'll need the following hardware configuration:\n\n### Cloud Platforms\n\nIf you are using AWS:\n\n- Instance Type: `p3.2xlarge` or higher\n- GPU: NVIDIA V100 or higher.\n\nIf you are using Paperspace:\n\n- Instance Type: `V100-32G` or higher\n- GPU: NVIDIA V100 or higher.\n\n### On-Premise Platforms\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Translate a text from English to French\n\n\\<\\<\\<domain\\>\\>\\>: Natural Language Processing Text2Text Generation\n\n\\<\\<\\<api_call\\>\\>\\>: T5ForConditionalGeneration.from_pretrained('google/t5-small')\n \n\\<\\<\\<api_provider\\>\\>\\>: Hugging Face Transformers\n\\<\\<\\<explanation\\>\\>\\>: \n1. Import the necessary classes from the transformers package, which includes T5ForConditionalGeneration for the text generation model.\n2. Use the from_pretrained method to load the pre-trained model 'google/t5-small'. This model has been trained on various NLP tasks, including translation.\n3. Provide an input text in English that needs to be translated to French. The model will generate a translation as output.\n4. Print the translated text.\n \n\\<\\<\\<code\\>\\>\\>:\n\n```python\nfrom transformers import T5ForConditionalGeneration\nt5 = T5ForConditionalGeneration.from_pretrained('google/t5-small')\ninput_text = \"Translate this text to French: I would like to learn French.\"\noutputs = t5.generate(input_text)\ntranslated_text = outputs[0]['generated_text']\nprint(translated_text)\n```\n\n### 2\ufe0f\u20e3 Prompt: Generate an image from a prompt\n\nPlease provide a prompt for the image you would like to generate. For example: \"a colorful landscape at sunset\" or any other description of the desired image. The model will then generate an image based on the provided prompt, which can be saved to a file if desired.\n\n\\<\\<\\<domain\\>\\>\\>: Computer Vision Image-to-Image\n\n\\<\\<\\<api_call\\>\\>\\>: ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_seg')\n\n\\<\\<\\<api_provider\\>\\>\\>: Hugging Face\n\n\\<\\<\\<explanation\\>\\>\\>: \n1. Import the necessary libraries and classes, such as ControlNetModel, StableDiffusionControlNetPipeline, UniPCMultistepScheduler, HfApiFlowLoader, and the Image class from the PIL package.\n2. Load the ControlNet model checkpoint with the provided model name 'lllyasviel/control_v11p_sd15_seg' using the from_pretrained method.\n3. Initialize the image generation pipeline by providing the loaded ControlNet model as an argument.\n4. Create a prompt describing the type of image you would like to generate, such as 'a colorful landscape at sunset'.\n5. Generate the image using the pipeline and the provided prompt by calling the pipeline with the prompt and number of inference steps.\n6. Save the generated image to a file if desired.\n\n\\<\\<\\<code\\>\\>\\>: \n\n```python\nfrom diffusers import ControlNetModel, StableDiffusionControlNetPipeline, UniPCMultistepScheduler\nfrom diffusers.utils import load_image\nfrom huggingface_hub import HfApiFlowLoader\nfrom PIL import Image\ncheckpoint = 'lllyasviel/control_v11p_sd15_seg'\ncontrolnet = ControlNetModel.from_pretrained(checkpoint)\npipe = StableDiffusionControlNetPipeline.from_pretrained('runwayml/stable-diffusion-v1-5', controlnet=controlnet)\npipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\npipe.enable_model_cpu_offload()\ngenerator = torch.manual_seed(0)\nprompt = 'a colorful landscape at sunset'\nimage = pipe(prompt, num_inference_steps=30, generator=generator).images[0]\nimage.save('landscape_sunset.png')\n```\n\n### 3\ufe0f\u20e3 Prompt: Upscale an image\n\n1. To upscale an image, you can use the 'ImageProcessor.from_pretrained' method from the transformers package. \n2. Import the required libraries, which include 'Image' from PIL, 'ImageProcessor' from transformers, and 'torch' for processing tensors. \n3. Load the input image using the 'Image.open' method from PIL. You can open a file or get the image data from a URL. \n4. Create an 'ImageProcessor' instance using the 'ImageProcessor.from_pretrained' method with the model name 'google/upsample_sd15_b2_canny'. This model is designed for Canny edge detection and will upscale images by a factor of 2. \n5. Process the input image with the 'processor' instance and save the upscaled image to a file. \n \n\\<\\<\\<domain\\>\\>\\>: Computer Vision Image-to-Image\n\n\\<\\<\\<api_call\\>\\>\\>: \n\n```python\nfrom transformers import ImageProcessor, CLIPModel\nfrom PIL import Image\nimport torch\nprocessor = ImageProcessor.from_pretrained('google/upsample_sd15_b2_canny')\nmodel = CLIPModel.from_pretrained('openai/clip-vit-large-patch14')\ninput_image = Image.open('input_image_path.jpg')\n# replace 'input_image_path.jpg' with the path to your image\nprocessor(input_image, return_tensors='pt')\nupscaled_image = model.generate(processor.images, num_inference_steps=20)\nupscaled_image.save('output_image_path.jpg')\n```\n\n### 4\ufe0f\u20e3 Prompt: Our customer is a zoo and we want to help them detect movement of different animals. Write a python program in 1 to 2 lines to call API in TensorFlowHub.\n\n```python\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline\nmodel = AutoModelForSequenceClassification.from_pretrained('setu4993/xlm-roberta-base-animal-detection')\ntokenizer = AutoTokenizer.from_pretrained('setu4993/xlm-roberta-base-animal-detection')\nprocessor = pipeline('setu4993/xlm-roberta-base-animal-detection', model=model, tokenizer=tokenizer)\ninputs = processor(input_text, return_tensors='pt')\noutputs = model(**inputs)\n```\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83e\udd9c\ud83d\udd17 Getting Started with Langchain\n\n```bash\npip install langchain openai\n```\n\nIt can be run simply using the langchain library as shown below:\n\n```python\nimport os\nfrom langchain.schema import HumanMessage\nfrom langchain.chat_models import ChatOpenAI\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8000/v1\")\nmessages = [HumanMessage(content=\"Generate an image from a text\")]\nprint(chat(messages))\n\n# output:\n# <<<domain>>>: Multimodal Text-to-Image Generation\n# <<<api_call>>>: StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4', vae='AutoencoderKL.from_pretrained(stabilityai/sd-vae-ft-mse)')\n# <<<api_provider>>>: Hugging Face\n# <<<explanation>>>:1. Import the necessary libraries: AutoencoderKL from diffusers.models and StableDiffusionPipeline from diffusers.\n# 2. Load the 'CompVis/stable-diffusion-v1-4' model and the'stabilityai/sd-vae-ft-mse' VAE model. The VAE model will be used for text encoding.\n# 3. Create a StableDiffusionPipeline instance by calling the from_pretrained method with the model and VAE as arguments.\n# 4. Provide a text prompt describing the desired image, and use the pipeline to generate an image based on the text prompt. Save the generated image to a file.\n```\n\nFor using it in a chat setting we recommend using a Chat Prompt Template as shown below:\n \n```python\n\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\nchat_template = \"\"\"\nUser: {user_message}\nAssistant: \n\"\"\"\nprompt = PromptTemplate(\n input_variables=[\"user_message\"],\n template=chat_template,\n)\n\nuser_message = \"Generate an image from a text\"\n\nchat = ChatOpenAI(openai_api_base=\"http://localhost:8000/v1\")\nchain = LLMChain(llm=chat, prompt=prompt, verbose=True)\nprint(chain.run(user_message=user_message))\n\n# output: \\> Finished chain.\n# <<<domain>>>: Multimodal Text-to-Image Generation\n# <<<api_call>>>: StableDiffusionPipeline.from_pretrained('CompVis/stable-diffusion-v1-4', vae='AutoencoderKL.from_pretrained(stabilityai/sd-vae-ft-mse)')\n# <<<api_provider>>>: Hugging Face\n# <<<explanation>>>:1. Import the necessary libraries: AutoencoderKL from diffusers.models and StableDiffusionPipeline from diffusers.\n# 2. Load the 'CompVis/stable-diffusion-v1-4' model and the'stabilityai/sd-vae-ft-mse' VAE model. The VAE model will be used for text encoding.\n# 3. Create a StableDiffusionPipeline instance by calling the from_pretrained method with the model and VAE as arguments.\n# 4. Provide a text prompt describing the desired image, and use the pipeline to generate an image based on the text prompt. Save the generated image to a file.\n\n```\n\n### \ud83d\udeab Limitations and Biases\n\nWe have noticed that the model sometimes generates responses with reference to some random unexisting model name on Huggingface(e.g. in the last prompt the model `setu4993/xlm-roberta-base-animal-detection` does not exists). Furthermore also the structure of the output is not always the same (refer to above prompt examples).\nWe recommend users of Gorilla models to develop guardrails and to take appropriate precautions for any production use.\n\nThe creators of Gorilla Falcon 7B have mentioned that despite their effort in addressing the risks of hallucinations like other LLMs, Gorilla models are not free from such limitations. We hope our open-sourced codebase will help other researchers better understand these challenges and improve on these key limitations for making AI beneficial for everyone.\n\n\n## \ud83d\udcdc License\nIt is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.", "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/chat-gorilla-mpt-7b/logo.svg", "modelInfo": {"memoryRequirements": 10609, "tokensPerSecond": 38}, "interfaces": ["chat"], "dockerImages": {"gpu": {"size": 31298452183, "image": "ghcr.io/premai-io/chat-gorilla-mpt-7b-gpu:1.0.0"}}, "defaultPort": 8000, "defaultExternalPort": 8756}, {"id": "redis-vector-db", "name": "Redis Vector DB", "description": "Redis, short for Remote Dictionary Server, serves as a multifunctional in-memory data structure store. It functions as a distributed key-value database, cache, and message broker, all operating in-memory for high-speed data access. With optional durability, Redis ensures data persistence despite potential system failures. [Learn More](https://redis.com/solutions/use-cases/vector-database/)", "documentation": "# \ud83d\udcd1 Documentation\n\n## \ud83d\udccc Description\n\nRedis, short for Remote Dictionary Server, serves as a multifunctional in-memory data structure store. It functions as a distributed key-value database, cache, and message broker, all operating in-memory for high-speed data access. With optional durability, Redis ensures data persistence despite potential system failures. <a href='https://redis.com/solutions/use-cases/vector-database/' target='_blank'>Learn more</a> \ud83d\ude80.\n\n## \ud83d\udc47 Getting Started (Implementation)\n\nThe service can be used with Langchain. You can check the <a href='https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/redis.html' target='_blank'>official documentation</a>. In the code snippet below, weassume that you are using <a href='https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2' target='_blank'>`all-miniLM-l6-v2`</a> model for embeddings generation and the service is running locally on port 8001.\n\n```python\n\n!pip install redis\n\nimport os\n\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.docstore.document import Document\nfrom langchain.vectorstores.redis import Redis\n\nos.environ[\"OPENAI_API_KEY\"] = \"random-string\"\n\ndoc1 = Document(page_content=\"Prem is an easy to use open source AI platform. With Prem you can quickly build provacy preserving AI applications.\")\ndoc2 = Document(page_content=\"\"\"\nPrem App\n\nAn intuitive desktop application designed to effortlessly deploy and self-host Open-Source AI models without exposing sensitive data to third-party.\n\n\"\"\")\ndoc3 = Document(page_content=\"\"\"\nPrem Benefits\n\nEffortless Integration\nSeamlessly implement machine learning models with the user-friendly interface of OpenAI's API.\n\nReady for the Real World\nBypass the complexities of inference optimizations. Prem's got you covered.\n\nRapid Iterations, Instant Results\nDevelop, test, and deploy your models in just minutes.\n\nPrivacy Above All\nYour keys, your models. We ensure end-to-end encryption.\n\nComprehensive Documentation\nDive into our rich resources and learn how to make the most of Prem.\n\nPreserve Your Anonymity\nMake payments with Bitcoin and Cryptocurrency. It's a permissionless infrastructure, designed for you.\n\"\"\")\n\n# Using sentence transformers all-MiniLM-L6-v2\nembeddings = OpenAIEmbeddings(openai_api_base=\"http://localhost:8444/v1\")\n\n# Using locally running Redis\nurl = \"redis://localhost:6379\"\n\nvectorstore = Redis.from_documents([doc1, doc2, doc3], embeddings, redis_url=url, index_name=\"prem_index_test\")\n\nquery = \"What are Prem Benefits?\"\ndocs = vectorstore.similarity_search(query)\nprint(docs[0].page_content)\n```", "interfaces": ["vector-store"], "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/vector-store-redis/logo.svg", "modelInfo": {}, "volumePath": "/data", "dockerImages": {"cpu": {"size": 261514099, "image": "redis/redis-stack-server:latest"}}, "defaultPort": 6379, "defaultExternalPort": 6379}, {"id": "stable-diffusion-2", "name": "Stable Diffusion 2", "description": "Stable Diffusion v2 is an enhanced version of the Stable Diffusion v2-base model, developed by Robin Rombach and Patrick Esser. This model is designed to generate and modify images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encoder (OpenCLIP-ViT/H). The model was initially trained from the Stable Diffusion v2-base model and then further trained for an additional 150k steps using a v-objective on the same dataset. It was then resumed for another 140k steps on 768x768 images.", "documentation": "# Documentation\n\n## \ud83d\udccc Description\n\nStable Diffusion v2 is an enhanced version of the Stable Diffusion v2-base model, developed by Robin Rombach and Patrick Esser. This model is designed to generate and modify images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encoder (OpenCLIP-ViT/H). The model was initially trained from the Stable Diffusion v2-base model and then further trained for an additional 150k steps using a v-objective on the same dataset. It was then resumed for another 140k steps on 768x768 images. <a href='https://stability.ai/blog/stable-diffusion-v2-release' target='_blank'>Learn More</a>.\n\n## \ud83d\udcbb Hardware Requirements\n\nTo run the `stable-diffusion-2` service on Prem, you'll need access to a GPU with at least 16GiB of RAM.\n\n## \ud83d\udcd2 Example Usage\n\n### 1\ufe0f\u20e3 Prompt: Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\n\n![k5h9_ilY](https://github.com/premAI-io/prem-registry/assets/29598954/49d162c9-a308-466c-a038-9bb54d2009fd)\n\n### 2\ufe0f\u20e3 Prompt: Low polygon panda 3d\n\n![hPCoZERY](https://github.com/premAI-io/prem-registry/assets/29598954/51537f29-f4cc-469f-88c4-ad18559cb043)\n\n### 3\ufe0f\u20e3 Prompt: 3d hiper-realistic rick sanchez and morty\n\n![NWVcWCfw](https://github.com/premAI-io/prem-registry/assets/29598954/667d08ad-7dd7-436f-8e2e-5e05d547653d)\n\n### 4\ufe0f\u20e3 Prompt: Synthwave brad pitt wearing headphones, animated, trending on artstation, portrait\n\n![q6yKHgDv](https://github.com/premAI-io/prem-registry/assets/29598954/9b88388d-08b7-4a9a-b9a3-766497b3403a)\n\n## \ud83d\udee0\ufe0f Technical Details\n\n### \ud83d\ude80 Getting Started with OpenAI Python client\n\nThe service exposes the same endpoints as OpenAI DALL-E does. You can directly use the official `openai` python library.\n\n```python\n\n!pip install openai\n!pip install pillow\n\nimport io\nimport base64\nimport openai\n\nfrom PIL import Image\n\nopenai.api_base = \"http://localhost:9111/v1\"\nopenai.api_key = \"random-string\"\n\nresponse = openai.Image.create(\n prompt=\"Iron man portrait, highly detailed, science fiction landscape, art style by klimt and nixeu and ian sprigger and wlop and krenz cushart\",\n n=1,\n size=\"512x512\"\n)\n\nimage_string = response[\"data\"][0][\"b64_json\"]\n\nimg = Image.open(io.BytesIO(base64.decodebytes(bytes(image_string, \"utf-8\"))))\nimg.save(\"iron_man.jpeg\")\n\n```\n\n## \ud83d\udcdc License\n\nThe model is under CreativeML Open RAIL++-M License.\n", "beta": true, "icon": "https://raw.githubusercontent.com/premAI-io/prem-registry/main/diffuser-stable-diffusion-2/logo.svg", "modelInfo": {"memoryRequirements": 14705, "secondsPerImage": 8}, "interfaces": ["diffuser"], "dockerImages": {"gpu": {"size": 27239249755, "image": "ghcr.io/premai-io/diffuser-stable-diffusion-2-gpu:1.0.2"}}, "defaultPort": 8000, "defaultExternalPort": 9222}]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment