Reading Big Files in Node.js is a little tricky. Node.js is meant to deal with I/O tasks efficiently and
not CPU intensive computations. It is still doable though but I'd prefer doing such tasks in languages like python, R etc.
Reading, Parsing, Transforming and then Saving large data sets (I'm talking millions of records here) can be done in
a lot of ways but only a few of those are efficient. Following snippet is able to parse millions of records without
wasting a lot of CPU (15% - 30% max) and (40 MB - 60 MB max) memory. It is based on Streams
.
The following program expects the input to be a csv file source eg. big-data.unpr.csv
It saves the result as ndjson and not json as working with huge datasets is easier when done using ndjson format.