Skip to content

Instantly share code, notes, and snippets.

@simonmesmith
Last active October 10, 2023 13:47
Show Gist options
  • Save simonmesmith/4be2def6572e9216026e384347691862 to your computer and use it in GitHub Desktop.
Save simonmesmith/4be2def6572e9216026e384347691862 to your computer and use it in GitHub Desktop.
Book modernizer

Book modernization with GPT-3.5

This is a proof of concept that uses OpenAI's GPT-3.5 to modernize books.

For example, given this passage from Mary Shelley's The Last Man:

I visited Naples in the year 1818. On the 8th of December of that year, my companion and I crossed the Bay, to visit the antiquities which are scattered on the shores of Baiæ. The translucent and shining waters of the calm sea covered fragments of old Roman villas, which were interlaced by sea-weed, and received diamond tints from the chequering of the sun-beams; the blue and pellucid element was such as Galatea might have skimmed in her car of mother of pearl; or Cleopatra, more fitly than the Nile, have chosen as the path of her magic ship. Though it was winter, the atmosphere seemed more appropriate to early spring; and its genial warmth contributed to inspire those sensations of placid delight, which are the portion of every traveller, as he lingers, loath to quit the tranquil bays and radiant promontories of Baiæ.

GPT-3.5 modernizes it to:

I visited Naples in 1818. On December 8 of that year, my companion and I crossed the Bay to visit the ancient ruins scattered along the shores of Baiæ. The clear and sparkling waters of the calm sea covered remnants of old Roman villas, intertwined with seaweed and reflecting the sun's rays like diamonds. The blue and transparent element was so beautiful, it reminded me of the sea that Galatea might have skimmed in her mother-of-pearl chariot, or the path Cleopatra would have chosen for her magical ship, more fitting than the Nile. Despite it being winter, the atmosphere felt more like early spring, and its warm and pleasant temperature filled me with a serene delight as I hesitated to leave the peaceful bays and beautiful cliffs of Baiæ.

How to use it

  • Copy the code
  • Install the requirements with pip install -r requirements.txt
  • Run with streamlit run book_modernizer.py
  • Enter your OpenAI API key, upload book text, and click "Modernize"
    • Optionally: Modify the system prompt, such as to specify a particular flavor of output, like "modern science fiction"

Limitations and improvements

As mentioned above, this is a proof of concept. So, take note:

  • Processing a whole book will take a long time. This is because the code doesn't use asynchronous processing. Also, I haven't tested what happens when you try to process an entire book; it could break something.
    • Improvement: Use asynchronous processing. Be sure to add rate-limiting to avoid exceeding OpenAI's rate limites.
  • I haven't validated the token and cost estimation. So, use it at your own risk. Perhaps validate it on a small part of a book before running it on an entire book.
    • Improvement: Validate the token and cost estimation.
  • It only works with GPT-3.5. I've hard-coded this in.
    • Improvement: Allow use with other models. This includes GPT-4, but also open source models, either locally or via an API. Note that this will affect token and cost estimation.
import re
from io import StringIO
import openai
import streamlit as st
st.title("Book Modernizer")
# Retrieve OpenAI API key
api_key = st.text_input(
"Enter your OpenAI API key (not stored):",
type="password",
)
if api_key:
# Initialize OpenAI API
openai.api_key = api_key
# Get the book file from the user
if book_file := st.file_uploader(
"Upload a book to modernize:",
type=["txt"],
accept_multiple_files=False,
):
# Retrieve the book text
stringio = StringIO(book_file.getvalue().decode("utf-8"))
book_text = stringio.read()
# If we have text...
if book_text:
# Calculate token and cost estimates
word_count = len(re.findall(r"\w+", book_text))
token_count = int(word_count / 0.75)
input_cost = (token_count / 1000) * 0.0015
output_cost = (token_count / 1000) * 0.002
total_cost = input_cost + output_cost
# Show the user a confirmation screen an option to edit the
# system message
st.write(f"Word count: {word_count}")
st.write(f"Estimated token count: {token_count}")
st.write(f"Estimated input cost: {input_cost}")
st.write(f"Estimated output cost: {output_cost}")
st.write(f"Estimated total cost: {total_cost}")
system_message = (
"You are an AI book modernizer. You receive text from books "
"and modernize them. You do this by adhering to the following "
"rules:\n\n"
"1. Modernize the language of the text. Use contemporary "
"language and spelling. Use the language and spelling you "
"find in a contemporary bestseller.\n"
"2. Do not change proper nouns. For example, do not change "
"the names of people, places, or organizations.\n"
"3. Do not change capitalization. If a chunk of text starts "
"with a lowercase letter, keep it lowercase. If it starts "
"with an uppercase letter, keep it uppercase. If it "
"is all uppercase, keep it all uppercase.\n"
"4. Do not add or remove line breaks. Return the text with "
"the same line breaks as the input text, which should align "
"with paragraphs."
)
with st.expander("Edit system message (optional):"):
system_message = st.text_area(system_message)
if st.button("Modernize"):
chunks = [
book_text[i : i + 1000] # noqa: E203
for i in range(0, len(book_text), 1000)
]
# Initialize progress bar
progress_bar = st.progress(0)
num_chunks = len(chunks)
# Loop through each chunk and modernize the text
modernized_text = ""
for i, chunk in enumerate(chunks):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": system_message},
{
"role": "user",
"content": (
f"Original:\n\n{chunk}\n\n"
f"Modernized:\n\n"
),
},
],
)
modernized_text += response.choices[
0
].message.content.strip()
progress_bar.progress((i + 1) / num_chunks)
st.write("Download the modernized book:")
st.download_button(
"Download", modernized_text, "modernized_book.txt"
)
openai
streamlit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment