Skip to content

Instantly share code, notes, and snippets.

@falloutphil
Last active December 28, 2021 13:16
Show Gist options
  • Save falloutphil/00ee3831587ab70cb3c7d6cdac43c02c to your computer and use it in GitHub Desktop.
Save falloutphil/00ee3831587ab70cb3c7d6cdac43c02c to your computer and use it in GitHub Desktop.
Some ideas about how to efficiently read a file of numbers in scheme a la AoC

To test we create a file much larger than we’d read in with a typical AoC question - let’s mimic day 11 of 2021 (also valid for day 09 too) available here.

shuf -i 1000000000-9999999999 -n 10000000 -o big_test.txt

With a file this big, my original solution runs out of memory due to garbage collection issues. The problem here was misunderstanding eager comprehensions - the nested map should be part of the list comprehension itself not applied afterwards!

A good link about comprehensions in Scheme for reference.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (λ (p)
      (list->array 2 (map (λ (str)
                            (map (compose string->number string) (string->list str)))
                          (list-ec (:port line p read-line) line))))))

So first change is to bring the nested map inside the comprehension, in the original we just return variable line which is the result of repeatedly applying read-line to port p until an eof received. Instead we apply a scheme expression to line below, converting it to a list and then mapping over this list to covert to numbers.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (λ (p)
      (list->array 2 (list-ec (:port line p read-line)
			      (map (compose string->number string)
				   (string->list line)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt")
%     cumulative   self             
time   seconds     seconds  procedure
 30.11     13.66     13.60  string
 15.40      6.95      6.95  %read-line
 15.26      6.89      6.89  string->number
 11.82      5.33      5.33  ice-9/boot-9.scm:797:8
  8.44      3.81      3.81  string->list
  8.17    168.91      3.69  srfi/srfi-1.scm:584:5:map1
  4.25      1.92      1.92  ice-9/boot-9.scm:798:28
  2.16      1.19      0.98  ice-9/boot-9.scm:790:0:compose
  1.22     45.12      0.55  <current input>:133:4
  1.15      1.10      0.52  srfi/srfi-1.scm:580:2:map
  0.68      0.30      0.30  procedure?
  0.61      0.27      0.27  list?
  0.41      7.13      0.18  ice-9/rdelim.scm:193:0:read-line
  0.20      0.09      0.09  %after-gc-thunk
  0.07      0.03      0.03  reverse
  0.07      0.03      0.03  list->array
  0.00     45.15      0.00  ice-9/ports.scm:438:0:call-with-input-file
  0.00     15.58      0.00  anon #x1bb051c
  0.00      0.09      0.00  anon #x1bb0590
---
Sample count: 1481
Total time: 45.145951082 seconds (19.292058234 seconds in GC)
scheme@(day11)> 

SRFI-42 comes with a comprehension specifically for iterating over strings, which means we can get rid of the string->list, by nesting one comprehension inside another.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (λ (p)
      (list->array 2 (list-ec (:port line p read-line)
			      (list-ec (:string ch line)
				       ((compose string->number string) ch)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt")
%     cumulative   self             
time   seconds     seconds  procedure
 21.37      9.05      9.05  string
 16.47      9.23      6.98  ice-9/boot-9.scm:790:0:compose
 14.58      6.18      6.18  %read-line
 13.93      5.90      5.90  string->number
 11.75     42.34      4.98  <current input>:164:4
 10.98      4.65      4.65  ice-9/boot-9.scm:797:8
  5.37      2.28      2.28  ice-9/boot-9.scm:798:28
  4.37      1.85      1.85  reverse
  0.89      6.55      0.38  ice-9/rdelim.scm:193:0:read-line
  0.24      0.10      0.10  <current input>:165:6:reverse@@srfi/srfi-42
  0.06      0.03      0.03  list->array
  0.00     42.36      0.00  ice-9/ports.scm:438:0:call-with-input-file
  0.00     11.33      0.00  anon #x1bb051c
---
Sample count: 1694
Total time: 42.364101649 seconds (14.487588887 seconds in GC)
scheme@(day11)>

This is maybe slightly faster, but the string constructor is still costing us, it would be better if we could deal directly with the chars rather than converting back to strings.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (lambda (p)
      (list->array 2 (list-ec (:port line p read-line)
			      (list-ec (:string ch line)
				       (if (char-numeric? ch))
				       (- (char->integer ch) 48)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt")
%     cumulative   self             
time   seconds     seconds  procedure
 42.32      7.24      7.24  %read-line
 30.15     17.08      5.16  <current input>:269:4
 16.10      2.76      2.76  reverse
  9.18      1.57      1.57  char-numeric?
  1.12      7.44      0.19  ice-9/rdelim.scm:193:0:read-line
  0.94      0.16      0.16  <current input>:275:6:reverse@@srfi/srfi-42
  0.19      0.03      0.03  list->array
  0.00     17.12      0.00  ice-9/ports.scm:438:0:call-with-input-file
---
Sample count: 534
Total time: 17.115120396 seconds (7.343203862 seconds in GC)
scheme@(day11)> 

You might save a few seconds by removing the numeric check too if you don’t care about checking your inputs:

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (lambda (p)
      (list->array 2 (list-ec (:port line p read-line)
			      (list-ec (:string ch line)
				       (- (char->integer ch) 48)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt")
%     cumulative   self             
time   seconds     seconds  procedure
 51.00      6.45      6.45  %read-line
 28.95     12.62      3.66  <current input>:172:4
 16.93      2.14      2.14  reverse
  2.23      6.73      0.28  ice-9/rdelim.scm:193:0:read-line
  0.67      0.08      0.08  <current input>:178:6:reverse@@srfi/srfi-42
  0.22      0.03      0.03  list->array
  0.00     12.65      0.00  ice-9/ports.scm:438:0:call-with-input-file
---
Sample count: 449
Total time: 12.648481982 seconds (4.129522935 seconds in GC)
scheme@(day11)> 

Whilst the use of nested list-ec is certainly neat, it means that bad input is either incorrectly processed or at best filtered out using an if qualifer inside the comprehension. Let’s keep the char approach but produce a proper error using a lambda instead.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (lambda (p)
      (list->array 2 (list-ec (:port line p read-line)
			      (map (λ (ch) (if (char-numeric? ch)
					      (- (char->integer ch) 48)
					      (error "NaN!")))
				   (string->list line)))))))

This seems to be slightly slower than using list-ec, note that if we keep the map but remove the if check completely the result remains similar to but slightly slower than equivalent test above using list-ec without an if qualifier.

scheme@(day11)> ,pr (parse-input "big_test.txt")
%     cumulative   self             
time   seconds     seconds  procedure
 38.56      6.99      6.99  %read-line
 18.31      3.32      3.32  string->list
 16.37     35.80      2.97  srfi/srfi-1.scm:584:5:map1
  8.63      3.10      1.56  <current input>:234:35
  8.45      1.53      1.53  char-numeric?
  2.29     18.11      0.42  <current input>:229:4
  2.29      0.96      0.42  srfi/srfi-1.scm:580:2:map
  1.76      7.31      0.32  ice-9/rdelim.scm:193:0:read-line
  1.76      0.32      0.32  list?
  1.23      0.22      0.22  procedure?
  0.18      0.03      0.03  reverse
  0.18      0.03      0.03  list->array
  0.00     18.14      0.00  ice-9/ports.scm:438:0:call-with-input-file
---
Sample count: 568
Total time: 18.137448602 seconds (8.826047527 seconds in GC)
scheme@(day11)> 

Finally you’re perhaps thinking do you need both list-ec calls - the anwser is yes. Just like Python list comprehensions when can chain the :port and :string qualifiers together but we then get a flat list, for example.

(define (parse-input filename)
  "Read file of numbers into 2D array."
  (call-with-input-file filename
    (lambda (p)
      (list-ec (:port line p read-line)
	       (:string ch line)
	       (- (char->integer ch) 48)))))

Note that in this example you probably wouldn’t bother to use read-line as you could read-string to get the whole file as one string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment