Skip to content

Instantly share code, notes, and snippets.

@monteiro
Last active February 10, 2024 15:27
Show Gist options
  • Save monteiro/b31ea25fb647ca2623c2fde6fe9a3066 to your computer and use it in GitHub Desktop.
Save monteiro/b31ea25fb647ca2623c2fde6fe9a3066 to your computer and use it in GitHub Desktop.
Translate legal documents to any Portuguese of Portugal
<?php
require 'vendor/autoload.php';
// use an html version of the contract exported from Microsoft Word, to avoid having to deal with word formatting issues
$html = file_get_contents(__DIR__ . '/contract.html');
// Assuming you have a function that can split the HTML content into manageable chunks
// This is a placeholder; you'd need to implement or find a suitable way to split the HTML.
$chunks = split_html_into_chunks($html, 4000); // Split HTML into chunks of up to 4000 tokens
$client = OpenAI::client('YOUR_OPEN_AI_KEY');
$translatedChunks = [];
foreach ($chunks as $chunk) {
$result = $client->chat()->create([
'model' => 'gpt-3.5-turbo',
'messages' => [
['role' => 'assistant', 'content' => 'You are an html translator. You will receive an html file, convert the text inside it. Inside spans or divs keeping the html markup. You will always convert to Portuguese of Portugal. The language used will be as formal as possible to use this as a legal document.'],
['role' => 'user', 'content' => $chunk],
]
]);
// Assuming $result contains the translated text, append it to the $translatedChunks array
$translatedChunks[] = $result['choices'][0]['message']['content']; // This line might need adjustment based on the actual structure of $result
}
// Combine the translated chunks back into a single HTML document
$translatedHtml = implode("", $translatedChunks);
// Function to split HTML content - This is a placeholder for illustrative purposes.
// You would need to implement logic here that appropriately splits the HTML content while preserving tags and not breaking the HTML structure.
// you can improve costs by ignoring the first part which is only styling.
function split_html_into_chunks($html, $chunkSize = 3000): array
{
$chunks = [];
$length = strlen($html);
for ($i = 0; $i < $length; $i += $chunkSize) {
$chunks[] = substr($html, $i, $chunkSize);
}
return $chunks;
}
file_put_contents(__DIR__ . '/contract-translated.html', $translatedHtml);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment