Skip to content

Instantly share code, notes, and snippets.

@glennzw
Created July 2, 2016 16:04
Show Gist options
  • Save glennzw/bc9cdca912bd08d30f66211d538c293d to your computer and use it in GitHub Desktop.
Save glennzw/bc9cdca912bd08d30f66211d538c293d to your computer and use it in GitHub Desktop.
Reading non-ascii CSV file in Python
#e.g. file:
# 13893,Mickey,Brady,Sinn Féin,Newry and Armagh
import csv
import codecs
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
reader = unicode_csv_reader(codecs.open("file.csv", encoding="iso-8859-1"))
for row in reader:
print row
@Bklyn
Copy link

Bklyn commented Sep 5, 2018

Useful. Thanks!

@iradofurioso
Copy link

Another solution

def to_lines(f):
    line = f.readline()
    while line:
        yield line
        line = f.readline()


    with codecs.open(migratePath,'r',encoding=encodingCharset) as csvfile:
        
        csv_reader = csv.reader(to_lines(csvfile), delimiter=';')

        line_count = 0
        for row in csv_reader: 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment