Skip to content

Instantly share code, notes, and snippets.

@deton
Created July 21, 2024 01:46
Show Gist options
  • Save deton/8a8f6ff77a45ff77faf3e62724a3dff6 to your computer and use it in GitHub Desktop.
Save deton/8a8f6ff77a45ff77faf3e62724a3dff6 to your computer and use it in GitHub Desktop.
Lynx patch to ignore xml encoding if charset is specified in Content-Type header
# Lynx patch to ignore xml encoding if charset is specified in Content-Type header
## Problem
Some sites (2chcopipe.com, digital-thread.com) use
incorrect <?xml encoding value.
Lynx displays garbled characters.
Content-Type: text/html; charset=euc-jp
...
<?xml version="1.0" encoding="UTF-8"?>
...
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp" />
...
(euc-jp encoded body text)
## Patch
Change to ignore xml encoding if charset is specified in Content-Type HTTP response header.
## TODO
* Use xml encoding like META charset (not only check utf-8)
--- ../orig/lynx2.9.2/WWW/Library/Implementation/SGML.c 2024-04-12 05:22:19.000000000 +0900
+++ WWW/Library/Implementation/SGML.c 2024-07-20 10:36:02.977162250 +0900
@@ -896,6 +896,8 @@ static void handle_processing_instructio
int flag = me->T.decode_utf8;
me->strict_xml = TRUE;
+ if (HTAnchor_getUCLYhndl(me->node_anchor, UCT_STAGE_MIME) >= 0)
+ return;
/*
* Switch to UTF-8 if the encoding is explicitly "utf-8".
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment