-
-
Save hugowetterberg/81747 to your computer and use it in GitHub Desktop.
<?php | |
mb_internal_encoding("UTF-8"); | |
$desc = <<<TEXT | |
<p>Lines of text SHOULD NOT be longer than 75 octets, (och hör på den) excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique.</p> | |
That is, a long line can be split between any two characters by inserting a CRLF | |
immediately followed by a single linear white space character (i.e., | |
SPACE, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence | |
of CRLF followed immediately by a single linear white space character | |
is ignored (i.e., removed) when processing the content type. | |
TEXT; | |
/** | |
* Apply folding compliant with RFC 5545 | |
* See https://www.rfc-editor.org/rfc/rfc5545#section-3.1 | |
* | |
* @param string $preamble The property name, e.g. DESCRIPTION | |
* @param string $value The value for the property, e.g. a very long string | |
* @param bool $strip_tags Strip HTML tags from the value | |
* | |
* @return string Returns the folded string without the property name | |
*/ | |
function ical_split($preamble, $value, $strip_tags=true) | |
{ | |
$value = trim($value); | |
$value = preg_replace('/[\r\n]+/', ' ', $value); | |
$value = preg_replace('/\s{2,}/', ' ', $value); | |
if ($strip_tags) { | |
$value = strip_tags($value); | |
} | |
$value = $preamble . ':' . $value; | |
$offset = 0; | |
$chunkSize = 75; | |
$lines = []; | |
while ($line = mb_strcut($value, $offset, $chunkSize - 1)) { | |
$lines[] = $line; | |
$offset += $chunkSize; | |
} | |
return substr(join("\r\n\t", $lines), strlen($preamble) + 1); | |
} | |
$split = ical_split('DESCRIPTION:', $desc); | |
print 'DESCRIPTION:' . $split; | |
// Test results | |
$lines = preg_split('/\r\n/', 'DESCRIPTION:' . $split); | |
print "\n\nTests\n"; | |
foreach ($lines as $i => $line) { | |
print "Line {$i}: " . strlen($line) . " octets\n"; | |
} | |
print "\nAlt desc output:\n"; | |
$split = ical_split('X-ALT-DESC:', $desc, false); | |
print 'X-ALT-DESC:' . $split; | |
print "\n\n"; |
DESCRIPTION:Lines of text SHOULD NOT be longer than 75 octets, (och hör | |
å den) excluding the line break. Long content lines SHOULD be split into | |
multiple line representations using a line "folding" technique. That is, | |
long line can be split between any two characters by inserting a CRLF imm | |
diately followed by a single linear white space character (i.e., SPACE, US | |
ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence of CRLF follow | |
d immediately by a single linear white space character is ignored (i.e., r | |
moved) when processing the content type. | |
Tests | |
Line 0: 73 octets | |
Line 1: 75 octets | |
Line 2: 75 octets | |
Line 3: 75 octets | |
Line 4: 75 octets | |
Line 5: 75 octets | |
Line 6: 75 octets | |
Line 7: 41 octets | |
Alt desc output: | |
X-ALT-DESC:<p>Lines of text SHOULD NOT be longer than 75 octets, (och hö | |
på den) excluding the line break. Long content lines SHOULD be split int | |
a multiple line representations using a line "folding" technique.</p> Tha | |
is, a long line can be split between any two characters by inserting a CR | |
F immediately followed by a single linear white space character (i.e., SPA | |
E, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence o | |
CRLF followed immediately by a single linear white space character is ign | |
red (i.e., removed) when processing the content type. | |
@keize That's what taken into account at line #28, if the octet count (strlen) is bigger the available space, then $mbcc (multibyte character count) is decreased by the overflow and the mb_substr is attempted again. No line that has a octet count larger than 75 should ever get appended.
Cool gist. However I think you need to escape commas:
$value = str_replace(',', ',', $value);
Thank u very much for sharing that function with us, I've embedded it in a new cms @contao -Extension.
(Even if I currently had to disable it because of validation-problems)
to be RFC complaint the octets must be
Lines of text SHOULD NOT be longer than 75 octets, excluding the line
break. Long content lines SHOULD be split into a multiple line
representations using a line "folding" technique. That is, a long
line can be split between any two characters by inserting a CRLF
immediately followed by a single linear white space character (i.e.,
SPACE, US-ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence
of CRLF followed immediately by a single linear white space character
is ignored (i.e., removed) when processing the content type.
taken from Internet Calendaring and Scheduling Core Object Specification
so change
return join($lines, "\n\t");
to return join($lines, "\r\n\t");
I have also used this as part of my .ics creation routine in my software QWcrm. I have tried to make my output all RFC compliant. Outputting a calendar event from Microsoft Outlook as an .ics helps.
Thanks for this script.
Awesome! Thanks for this script. @keizie is right - you have to use mb_strcut - or else if you have a single very long multibyte, the code will loop and crash
Great job!
Honestly, I never had any problems creating ics files with more than 75 columns, but I also wanted to eliminate the warnings from the validation.
My problem is that I use the "X-ALT-DESC" tag to write the text of the description formatted with html commands and this function eliminates all the html commands, can you help me?
@keizie That's what taken into account at line #28, if the octet count (strlen) is bigger the available space, then $mbcc (multibyte character count) is decreased by the overflow and the mb_substr is attempted again. No line that has a octet count larger than 75 should ever get appended.
As far as I understand from the RFC is that lines should be folded at a length of 75 characters including the property.
Depending on the way the ICS data is generated, your function might end up in an endless loop, particularly when using the while loop on line 28 if you have a long preamble or property.
As @keizie and @djkgamc commented, mb_strcut
does exactly what we need:
If the cut position happens to be between two bytes of a multi-byte character, the cut is performed starting from the first byte of that character.
Although I'm in favor of properly applying the folding technique on multibyte strings, it should be noted that in section 3.1 of RFC 5545 the responsibility of supporting multibyte strings is put on the implementation of the unfolding technique instead of the folding technique:
Note: It is possible for very simple implementations to generate improperly folded lines in the middle of a UTF-8 multi-octet sequence. For this reason, implementations need to unfold lines in such a way to properly restore the original sequence.
Anyway, I believe the function can be improved to handle longer properties, as well as be more compliant with the RFC as @sqren suggested, and handle HTML in the X-ALT-DESC
property as @Giulo77 requested:
/**
* Apply folding compliant with RFC 5545
* See https://www.rfc-editor.org/rfc/rfc5545#section-3.1
*
* @param string $preamble The property name, e.g. DESCRIPTION
* @param string $value The value for the property, e.g. a very long string
*
* @return string Returns the folded string without the property name
*/
function ical_split($preamble, $value)
{
$value = trim($value);
$value = preg_replace('/[\r\n]+/', ' ', $value);
$value = preg_replace('/\s{2,}/', ' ', $value);
$value = $preamble . ':' . $value;
$offset = 0;
$chunkSize = 75;
$lines = [];
while ($line = mb_strcut($value, $offset, $chunkSize - 1)) {
$lines[] = $line;
$offset += $chunkSize;
}
return substr(join("\r\n\t", $lines), strlen($preamble) + 1);
}
Huh, 14 years... time flies :)
Your implementation looks nice and elegant @viavario. Stripping out tags should probably have been separate from the folding, but I added an optional param to your implementation that can be used to disable tag stripping, preserving the old behaviour.
mb_substr() count multibyte into one character and malfunction with a string with full of multibytes. mb_strcut() works well.