Some Info

Script typically has #! at front (usually called hash bang like what it is like in Perl)

in UNIX like system, chmod +x myscript.py and include a #! in the file would make it executable scripting
#!/usr/local/bin/python
UNIX Python path look-up trick: can also use #!/usr/bin/env python to let the system finds the path for you

Some Special Reserved Words

Single underscore ‘_’ contains the last evaluated value
- a = 3 a _ #gives 3 notice must explicitly call a, then call _

Python uses None as Null in C++

Python Object Types

Python programs can be decomposed into modules, which contain statement, which contain expressions, which create and process objects

Use dir(variable) to list all methods that could be used

Built-in Objects

Numbers, Strings and Tuples are immutable
Lists, dictionaries and sets are not immutable

Object Type	Example/creation
Numbers	1234,3.1415,3+4j,0b111,Decimal(),Fraction()
Strings	‘spam’,”bos’s”,b’a\x01c’,u’sp\xc4m’
Lists	[1,[2,’three],4.5],list(range(10))
Dictionaries	{‘food’:’spam’,’taste’:’yun’},dict(hours=10)
Tuples	(1,’spam’,4,’U’),tuple(‘spam’), namedtuple
File	open(‘eggs.txt’), open(r’C:\ham.bin’, ‘wb’
Sets	set(‘abc’),{‘a’,’b’,’c’}
Other core types	booleans, types, None
program unit types	functions, modules, classes
Implementation	compiled code,stack tracebacks

Numbers

use 3.1415 * 2 would give 6.28300000004 (Full precision) but print(3.1415*2) would give an user-friendly format 6.283

for now if something looks odd, try to use print(.*) (str format)

Python can calculate very large numbers. No type limit! Really nice

Math module is very useful. Modules are tools to utilize

import math
math.pi would give 3.141592....
math.sqrt(85)

random module performs random-number generation and selections

import random
random.random()
random.choice([1,2,3,4]) //list coded in sqaure bracket

Python also incorporates exotic numeric objections like complex, fixed-precision and rational numbers as well as set and bool

Python can also use 3rd party developed data type (including matrix and vector)

Strings

S = ‘Spam’ line = ‘aaa,bbb,ccccc,dd’ line_n = ‘aa,bb,cc\n’

Method	Example	output
find	S.find(‘pa’)	1
replace	S.replace(‘pa’, ‘XYZ’)	SXYZm note that string is immutable, so we can’t S[0] = ‘p’
split	line.split(‘,’)	[‘aaa’,’bbb’,’ccccc’,’dd’]
upper	S.upper()	SPAM
isalpha	S.isalpha()	True
isdigit
rstrip	line_n.rstrip()	‘aa,bb,cc’ (remove return char at right side)
	line_n.rstrip().split(‘,’)
ord(‘\n’)	10	‘\n’ is 10 in ASCII
endswith	S.endswith(‘b’)	ends with a charactor or string: return True or False
startswith	S.startswith(‘b’)	Starts with charactor ‘b’? return True or False

Formating (RHS of % is tuple)	Output
‘%s,eggs, and %s’ % (‘spam’,’SPAM’)	spam,eggs, and SPAM!
’{},eggs,and {}’.format(‘spam’,’SPAM!’)	spam,eggs,and SPAM!
‘spam’.encode(‘utf8’)	Encoded to 4 bytes in UTF-8 in files
‘There are %d %s birds’ % (2,’black’)	‘There are 2 black birds
Formating with Dict	Output
’%(qty)d more %(food)s’ % {‘qty:1,’food’:’spam’}	‘1 more spam’

Strings are called sequences in python — a positionally ordered collection of other objects

Double quote and single quote are interchangable

Triple Quotes (block string)

begin with three quotes, followed by any number of lines o text, closed with the same triple-quote seq.
Single or doule quotes can be embedded in the string’s text
example mantra = “”“Always look … on the bright … side of life. “””
Another common usage of triple quote nowadays is to temporarily disable part of code (like commenting off) X = 1 “”” import os print(os.getcwd()) “”” Y =2 #now anything between the triple quotes are disabled if rerun
Triple Quote also allows to quote # (typically # is comment in python)
To sum up, triple quotes are good for multiline text in my program, so it is commonly used in documentation strings

A classic example of using triple quotes with formating with dictionary (dictionaries)

reply = “”” Greetings… Hello %(name)s! Your age is %(age)s “”” values = {‘name’:’Bob’,’age’:40} print(reply % values)

Formatting Method (the other flexible way to format strings besides using %)

Instead of using % to format a string, we could also use string.format method (mostly same with %)

By position

template = ‘{0},{1},{2}’

template.format(‘spam’,’ham’,’egg’) #return spam, ham and egg

By Keyword

template = ‘{motto},{pork} and {food}’

template.format(motto=’spam’,pork=’ham’,food=’eggs’)

By both

template = ‘{motto},{0},{food}’

template.format(‘ham’,motto=’spam’,food=’egg’) #return spam, ham, egg

By relative position

template = ‘{},{},{}’

template.format(‘spam’,’ham’,’eggs’)

X = ‘{motto},{0}’.format(42,motto=3.14) #returns 3.14,42

Adding keys Attributes and Offsets

import sys

‘My {1[kind]} runs {0.platform}’.format(sys,{‘kind’:’laptop’}) #returns My laptop runs win32 (1 is the 2nd element)

‘My {map[kind]} runs {sys.platform}’.format(sys=sys,map={‘kind’:laptop’})

Raw Strings are used to turn off backslash converting:

If we were to open a file: myfile = open(r’C:\new\text.dat’,’w’) #\n won’t be converted to newline character

String Operations

S = ‘Spam’
len(S) gives 4
S[0] gives S
S[1] gives p
S[-1] gives m
S[-2] gives a
S[len(S)] gives the last character m
S[1:3] gives pa //notice this one is strange. This means gives me offset from 1 to 3 but not including 3 (1:2 indeed orz)
S[0:3] gives Spa
S[1:] gives ‘pam’
S[:-1] everything but the last
S + ‘xyz’ string concatenation
- this is actually polymorphism. An operation depends on the objects being operated on
S * 8 gives SpamSpamSpamSpamSpamSpamSpamSpam #this is really useful print(‘--------------------------------------------------------------‘) #e.g 80 dash I needed to print print(‘-‘*80) #and it is that simple
S[i:j:k] accepts a step (default to +1) k
S[::2] means gets every other item from the beginning to the end (b.c first and second limits’ default are 0
S[::-1] means to reverse the string

String Slice (slicing) and index (indexing)

0 1 2 -2 -1

[: :]

Strings are immutable. They cannot be changed after created.

we can’t change one specific character by using position

S = 'spam'
S[0] = 'z' //error!!!

//but instead we could do
S = 'z' + S[1:] //hahaha smart!

Every object in Python is classified as either immutable or not.

len(s) returns size of the string

use dir(S) to list variables assigned in the caller’s scope when called with no argument

double undercores are implementation of the object and are available to support cutomization (operator overloading)
S + ‘NI!’ //this basically calls the __add__ ‘spamNI!’
S.__add__(‘NI!’) ‘spamNI!’

help(s.replace) would give the methods help message

help is one of a most handful interfaces to a system of code that ships with Python known as PyDoc PyDoc is a tool that extracts doc from objects

Pattern Matching

import re

	 import re
	 match = re.match('Hello[ \t]*(.*)world', 'Hello     Python world')
	 match.group(1) //this gives 'Python '

	 import re
	 match = re.match('[/:](.*)[/:](.*)[/:](.*)', '/usr/home:lumberjack')
	 match.groups //('usr','home','lumberjack')

	 re.split('[/:]','/usr/home/lumberjack')
	 ['','usr','home','lumberjack']

Use ‘in’ to find a match or substring:

‘ab’ in ‘cabsf’ #returns True

Type cast or conversion

python does not allow + different types I = 1 G = ‘2’ G+I #error!
use int or str int(G) + I # force addition G + str(I) # force concatenation
we also have float…

Lists

length

L = [123, ‘spam’, 1.23] len(L) //gives 3

Initialize a list with fixed number of elements

[for x
[None]*8
[[None]*3]*2 # [[None,None,None],[None,None,None]] #This won’t work as this each entry is a shallow copy
- use [ [None]*x for _ in range (3)]

Strings are immutable. But Lists are not. Any modifications are done in-place instead of generating a new obj

S = ‘abc’ L = list(S) L #gives [‘a’,’b’,’c’] and now we can change each element S = ”.join(L) # or if numbers, use S = ”.join(str(e) for e in L)

!!! Never re-assign a mutable object to itself at the same time as changing in-place methods are called

L = L.append(1) #We lost the reference of the list for L. Because append changes the list in place and it doesn’t return the object itself! if you assign the return value to L, then L will be pointing to None

Push Back and Push Front and pop

Push front

L = [1,2,3,4] I = 0 L = I + L # [0,1,2,3,4]
L.insert(0,I)

Push Back

L.append(0) #append the data structure in the () #notice append is usually faster than + because it doesn’t generate new obj
L.extend(0) #append iterables to L (if L.extend(“str”), L == [’s’,’t’,’r’] because string is iterable

But extend only works on iterable types (a list can be extended by a list, not a single integer) If L.extend(2) error! Use append for single built-in type like integer or string

Pop: returns the element (by default the last one) and delete this one from list

L.pop() #by default pops the last element in the end (pop back)
L.pop(0) #pops front. We could of course use any index to pop.

append would append the data structure in the end of a list, extend would extend it

L = [1,2,3] L.append([4,5]) #[1,2,3,[4,5]]
L.extend([4,5]) #[1,2,3,4,5]
extend is the same as L[len(L):] = [4,5] #see next section

Replacement, Insertion and Deletion (with multiple elements)

Slice replaces the entire section all at once. But use insert, pop and remove more please.... this is strange

Replacement/Insertion

Note: A quick way to understand this: L[1:2]: 1 and 2 are delimiters. L[1:2] covers the range, which only includes ‘2’
L = [1,2,3]

L[1:2] = [4,5] #[1,4,5,3] #replace 2 by 4,5 #if L[1] = [4,5], then [1,[4,5],3] !!!!

Insertion (replace nothing)

Same as last. L[1:1] is delimiters. 1 to 1 doesn’t cover any range, so no elements included in the range. Thus, insert, no replace
L = [1,2,3]

L[1:1] = [6,7] #[1,6,7,2,3]

Delete

L = [1,2,3]

L[1:2] = [] #[1,3]

del L[1:] # only [1] is left

Index, slice, concat, repeat

L[:-1] //slice a list returns a new list, gives [123,’spam’] notice not including -1. werid python
L + [4,5,6] //concat
L * 2 //repeat
L = [‘spam’,’egg’] L.index(‘spam’) #returns 0 (the index)

Create a list of constant (e.g. list of zeroes)

[0] * 10 # 10 zeroes in a list

Type-specific operations (lists are mutable)

L = [123,’spam’,1.23]

Operator	Example	output
append	L.append(“NI”)	[123,’spam’,1.23,’NI’] can also use +
pop (del)	L.pop(2)	1.23 (and it is removed from L)
insert	L.insert	insert value at arbitrary potition
remove	L.remove(“NI”)	pop a value by name
extend	L.extend(1,2,3)	add multiple values at the end
sort()	L.sort()
reverse	L.reverse()

It is not legal to index an non-existed position in a list

L[9999] = 1; //error: list index out of range

List Iteration

for x in [1,2,3]: print(x,end=’ ‘) #1 2 3 #end= defines what to print in the end of each element print

list count method

list.count(obj) #where obj is the object to be counted in the list

aList = [1,1,2,3,4,1]

aList.count(1) #returns 3

Lists can be nested. A list can contain lists, dictionaries and any types

M = [[1,2,3],[4,5,6],[7,8,9]] //matrix 3x3 M[1][2] //gives 6

Return a copy of the list: arr[:]

List Comprehensions like map or filter built-in functions. Really powerful in constructing complex matrix

col2 = [row[1] for row in M] //[2,5,8] collect items in column 2: or to say give me row 1 in each row in matrix M in a new list
col3 = [row[1]+1 for row in M] //[3,6,9]
col4 = [row[1] for row in M if row[1] % 2 ==0] // [2,8] filter out odd items
col5 = [M[i][i] for i in [0,1,2]] //[1,5,9] collect diagonal from matrix
col6 = [c*2 for c in ‘spam] // [‘ss’,’pp’,’aa’,’mm’]
list(range(4)) //[0,1,2,3]
list(range(-6,2,2) //[-6,-4,-2,0,2]
[[x**2,x**3] for x in range(4)] //[[0,0],[1,1],[4,8],[9,27]]
G = (sum(row) for row in M) //parentheses can be used to create generators that produce results on demand
res = [c*4 for c in ‘SPAM’] #[‘SSSS’,’PPPP’,’AAAA’,’MMMM]
list(map(abs,[0,-1,-2])) # 0,1,2 basically map takes a list and pass to the first argument, and return a list of return

Dictionaries (dictionary)

Operation	Interpretation
D = {}	empty dict
D = {‘cto’:{‘name’:’Bob’,’age’:40}	Nesting
D = dict(zip(keylist,valuelist))	zipping to form a dict from two lists
D.keys	returns all keys
D.values	returns all values
D.items()	all key+value tuples
D.copy()	copy
D.clear()	clear
D.update(D2)	merge from another dict by keys
D.get(key,default?)	fetch by key if absent default (or None)
D.pop(key,defualt?)	return and remove by key if absent default
D.setdefault(key,default?)	fetch by key
D.popitem()	remove and return any (key,vlaue) pair
len(D)	how many entries
del D[key]	delete entries by key

Indexing a dict in Python is very fast searching operation (constant)

so use dict to search instead of lists like x in [1,2,3]

Any IMMUTABLE objects (even tuples) can be keys for dictionary. But not mutable objects like lists or other dict

matrix = {} matrix[(2,3,4)] = 88 X=2;Y=3;Z=4 matrix[(X,Y,Z)] #returns 88

Created by using { } and colon : and indexed by [ ]

D = {‘food’:’Spam’, ‘qauntity’:4, ‘color’:’pink’}
D[‘food’] //return spam
D[‘qauntity’]+=1;
D // gives the entire dictionary

It is rare to know all data in a dictionary at first. So:

D= {}
D[‘name’] = ‘Bob’
D[‘job’] = ‘dev’
print(D[‘name’]) /give ‘Bob’

use setdefault(key,default)

if key is found, return the value of it
if key is not found, insert the with this key with default

Use dict and () to create dictionary AND use zip to map two lists to a dictionary (one list is key, one is value)

bob1 = dict(name=’bob’,job=’dev’,age=40) // same as {‘name’:’bob’,’job’:’dev’,’age’:40}
bob2 = dict(zip([‘name’,’job’,’age’],[‘bob’,’dev’,40]))
bob3 = dict([(‘name’,’Bob’),(‘age’,40)]) #dict key/value tuple form

Python allows dictionary nesting: multiple data types can co-exist as values in the same dictionary (really cool!! Much cooler than Perl)

rec = {‘name’:{‘first’:’Bob’,’last’:’Smith’}, ‘jobs’: [dev’,’mgr’], ‘age’:40.5}
rec[‘jobs’][-1] //give mgr
rec[‘jobs’].append(‘janitor’)
rec[‘name’][‘first’] //gives Bob

Dictionary can’t use operator + to concatenate. Use update

Pop takes a key as argument and return and delete that value

Delete content and reclaim memory

rec = 0 //garbage collection would automatically deallocate this part of memory

Although not needed, we could still initialize a dict or a list

L = [] #initialize an empty list L[99] = ‘spam’ #index out of range error!
L = {} #initialize an empty dict L[99] = ‘spma’ #Works!

Accessing non-exsiting key is a mistake

It is usually a programming error to fetch something that isn’t really there. But in many cases we need to test whether it is there
The dictionary in membership expresssions allows us to query the existence of a key
- ‘f’ in D //gives false if non-exsit
- if not ‘f’ in D:

print(‘missing’) //missing

if not ‘f’ in D:

print(‘missing”) print(‘no,really’) //if we have multiple lines of code to be executed in a if, we simply need to indent them

The dictionary get membership expressions
- value = D.get(‘x’,0) //try to get. if non-exsit, assign the default value (which is 0)
- value = D[‘x’] if ‘x’ in D else 0 //same
Use try and except
- try:
print(Matrix[(2,3,5)])

except: KeyError: print(0)

Sorting Keys:for loops (get all keys)

D = {zip([‘a’,’c’,’b’],[1,2,3])}
Ks = list(D.keys()) //Ks is [‘a’,’c’,’b’] #notice D.keys() return a view object not list. So use list(D.keys())
Ks.sort() // ks is [‘a’,’b’,’c’]
for key in Ks: print(key,D[key])

For loop introduction for the first time

for c in ‘spam’: print(c.upper()) //gives S P A M
x =4 while x>0: print(‘spam’ * x) x -=1 //gives spamspamspamspam spamspamspam spamspam spam

More Comprehension

squares = [x**2 for x in [1,2,3,4,5]]
OR
squares = []
for x in [1,2,3,4,5]: squares.append(x **2) //same
[key for (key,value) in Mydict.items() if value == V]
D = {k: v for (k,v) in zip([‘a’,’b’,’c’],[1,2,3])} #{‘b’:2,’c’:3,’a’:1}
D = {x:x**2 for x in [1,2,3,4]} #{1:1,2:4,3:9,4:16}

In Python 3.X, D.keys, D.values and D.items are returned in view type instead of list type

use list(D.keys) to override if seeing an error

Map and Filter might work as twice faster as iterations.... see later

use get() to try if a index exists (Python 3.X)

branch = {‘a’:1,’b’:2} print(branch.get(‘spam’,’bad choice’) #bad choice because ‘spam’ is not a index

Tuples

Tuple object is a list that cannot be changed. Tuples are sequences and they are immutable like strings

Tuples are used to repensent a fixed collections of items

T = (1,2,3,4) //a 4 item tuple
len(T) //gives 4
T + (5,6) //gives 1,2,3,4,5,6
T[0] //gives 1
T.index(4) // gives 3. This is used to find the value 4’s index

Can’t do T[0] = 1

Parenthesis is/are optional when creating tuples. Python treats un-parenthesized names (seperated by comma ‘,’) as tuples

a,b #same as (a,b)

Tuples support mixed type and nesting

T = ‘spam’,3.0,[11,22,33] #parenthesis is optional
T.append(4) //Error!! immutable

Why Tuple? Immutability! It’s like const in C. You pass it around and no one can change it

Tuples support concatenation with +, repeat with *, slice, in, comprehension, index, count…

Tuple Syntax peculiarities: Comma and parentheses: Must have the comma to represnet a tuple with single element

x = (40) # THIS IS NOT A TUPLE!!! This is just integer 40. This could be changed
x = (40,) # This is tuple. Must have the comma

Tuples can be converted to a list for sort, then convert back with a new tuple:

T = (3,5,1,6,2) L = list(T) L.sort() T = tuple(L)

count(x) shows how many x as elements are there in the tuple or list

T.count(2) #how many 2s are there in the tuple? return the count

Tuple can’t be changed in place. But a tuple may contain mutable object as one element, which could be changed in place

T = (1,[3,4],3) T[1][0]=2 #this works! Because a list is mutable and could be changed

(contin.) Tuple is only one-level-deep immutable

Files

Create a text output file

f = open(‘data.txt’,’w’) #’r’ read-only is default if no second argument passed
f.write(“hello\n’)
f.close()

Read the entire file and store into a string

f = open(‘data.txt’,’r’) //Hello\nworld\n
text = f.read() //read the entire file into a string. NOT ONE LINE!!!!!! use readline() instead
text.split() //gives [‘hello’,’world’]

Read a line and a character

input = open(r’C:\spam’,’r’) aString = input.readline() #read next line (including \n)
aString = input.read(N) #read N characters
file = open(‘test.txt’,’r’) while true: line = file.readline() if not line: break #not line means an ampty string is read (EOF) print(line.rstrip())

Write

output = open(r’dat.txt’,’w’) output.write(aString) #this also returns the number of character transfered from buffer to disk output.writelines(alist)

Change file position to offset N by using seek(N) for the next operation

Close a file

myfile = open(r’~/data.txt’,’r’) try: for line in myfile: print(line,end=’ ‘) finally: myfile.close() #this is actually optional. Python would automatically close the file when ended. But still a good habit

output files are always buffered. Use close or flush

by default, output files are always buffered, meaning text we write may not be transfered from memory to disk
use outputFile.flush() forces the buffered text to be transferred
or when we close the file, the transfer is also done

for line in open(‘data’):use line

Use For loop to iterate through lines (file datatype has an iterator built-in for accessing lines of a file)

the file object itself is the iterator in the file
for line in open(‘myfile.txt’): #this is like perl while(<>) lol print(line,end=’ ‘)
f = open(’s.txt’) f.__next__()

Conversion

Python relies on int(), bin(), str(), hex(), list() to read from file and convert
Use rstrip() a lot to get rid of \n in the end
- rstrip() by default remove \n
- if characters are passed in, remove that character
int() and other conversion function ignores \n
- lin = open(‘d.txt’,r)

s = lin.readline() #returns 89\n i = int(s) #89

Pickle module to store Native python objects

Many times we need to store some native python objects like dict or list to a file and then read back in when needed

This could be done by using eval(), which run python command in a string

line = F.readline() #[1,2,3]${‘a’:1,’b’:2}\n}

parts = line.split(‘$’) #[‘[1,2,3]’,”{‘a’:1,’b’:2}\n”] eval(parts[0]) #gives [1,2,3] objects = [eval(P) for P in parts] #then objects is a list of a list and a dict

Above way is working. But sometimes using eval is dangerous because it would execute any command that string is giving

What if the string passing in were to delete all files??

Use Pickle (for great performance and safty)

Need write and read in binary files. Pickle can convert built-in type to object and reverse

D = {‘a’:1,’b’:2}

F = open(‘database’,’wb’) #’wb’ is needed import pickle pickle.dump(D,F) #dumpt D to file F(database) F.close()

F = open(‘database’,’rb’)

E = pickle.load(F) #pickle automatically convert the string to a dict, and assign to E

Use shelve module (store pickled objects by key)

Shelve translates an object to its pickled string and store that string under a key in a dbm file

bob = Person(“bob smith”)

sue = Person(“sue jones”,job=’dev’,pay=1000) tom = Manager(“tom jones”,50000)

import shelve db = shelve.open(‘persondb’) for obj in (bob,sue,tom): db[obj.name] = obj db.close()

Sets

Unordered collection of immutable objects (ONLY immutable objects can be in a set

Usage

X = set(‘spam’)
Y = (‘h’,’a’, ‘m’)
X,Y //A tuple of two sets : {‘m’,’a’,’p’,’s’},{‘m’,’a’,’h’} //unordered
X & Y: {‘m’,’a’} intersection of the two sets
X | Y //union
X - Y //difference : {‘p’,’s’}
X > Y //Superset : false
X < Y //subset : true
‘p’ in set (‘spam’), ‘p’ in ‘spam’, ‘ham’ in [‘eggs’,’spam’,’ham’] //return: (true, true, true) #really useful in checking things like dictionary or hash. but set is really easy to use

“type” to tell whether the object is certain type (should never use!! Because this limits the type we can use in this program)

check type
what we care is what the object does not what it is. So this is not used almost
type(L) // <type ‘list’>
type(type(L)) // <class ‘type’>
if type(L) == type([]): print(‘yes’)
if type(L) == list: print(‘yes’)
if isinstance(L,list): print(‘yes’)

bool() can be used to tell if a list or dict is empty or not

bool([]) #false
bool([1]) #true
bool({}) #false

Common Mistakes and Gotchas

Assignment creates references, not Copies!!!

a = [1,2,3] b = [0,a,4] #[0,[1,2,3],4] a[0] = 0 b #[0,[0,2,3],4]
a = [1,2,3] b = [0,a[:],4] a[0] = 0 b #[0,[1,2,3],4] #note that [:] makes the slice limits 0. The length of the seq is sliced #so basically a[:] makes a copy and return that list instead of original list

Repetition adds one level deep

L = [1,2] X = L * 2 #[1,2,1,2] Y = [L] * 2 #list context: [[1,2],[1,2]]

Beware of cyclic data structures

Python print […] when it sees a cycle in object
- L = [‘g’]

L.append(L) L #[‘g’,[…]]

Rule of thumb is to avoid this....

Numeric Types

Numeric Literals

Literal	Interpratation
1234,24	Integers (unlimited size)
1.23, 1.3e-10, 4E210	Floating Point numbers
0o117, 0x9ff,0b10101	Octal, hex and binary
3+4j, 3.0+4.0j,3J	Complex
set(‘spam’), {1,2,3,4}	Sets
Decimal(‘1.0’), Fraction(1,3)	Deciam and fraction extensions
bool(x), True,False	Boolean type and constants

Built-in Numerical tool

pow, abs, round. int ,hex, bin…
random, math…
int(3.1415) #truncates float to integer float(3) #force to use 3.0 float type
All Python operators may be overloaded

Numeric Display Formats

b/(2.0+a) #might give 0.8000000000004
print(b/(2.0+a)) # gives rounds off digit 0.8
‘%e’ % num # gives 3.333e-01 string fromatting expression
‘%4.2f’ % num #0.33
‘{0:4.2f}’.format(num) #’0.33’ string formatting method

Chain Comparison

Python allows chain comparator: X < Y < Z #True X < Y > Z #False 1 < 2 < 3.0 < 4 #True 1 == 2 < 3 #same: 1 == 2 && 2 <3
Floating point number chain comparison might not work as expected 1.1 + 2.2 == 3.3 #True? not exactly…
int (1.1+2.2) == 3.3 #this would work…

Floor Division (truncating division)

X / Y # classic and true division. Alwasy keep the remainder regardless of types
X // Y #floor division: alwasys truncates fractional remainder 10 //4 #gives 2 10 // 4.0 # gives 2.0

Math module provides floor and trunc methods (floor always counts towards a more negative number, truncate just get rid of the fraction

import math math.floor(2.5) #gives 2 math.floor(-2.5) #gives -3 math.trunc(2.5) # gives 2 math.trunc(-2.5) #gives -2
math.pi, math.e provides pi and other common constant math.sin(2*math.pi/180)
math.sqrt(144)
math.min(), math.max() #really handy min and max function. no need to self implement

Hex, Octal, Binary: Literals and Conversion

oct(64), hex(64, bin(64) can covert a number into corresponding type
X = 99 bin(X),X.bit_length(),len(bin(X)) - 2 # (gives 0b1100011, 7, 7)

eval function treats a string as they were Python code

eval(‘64’), eval(‘0o100’) # gives 64, 64

import random provides random number generate

import random random.random()
random.randint(1,10)
random.choice([‘life of Brain’,’Holy’, “meaning’])
suits = [‘heart’,’clubs’,’diamond’,’spades’] random.shuffle(suits) #changes the order randomly of list suits

Dynamic Typing in Python

In python, no need to declare the type of a variable b/c types are determined automatically at runtime

Variable a is created when it got assigned by a value for the first time

BUT!! A variable never has any type information or constraints associated with it. Type always goes with objects

It is an error to reference an unassigned variable

So in python all names are variables (handles in java or sv). Others are objects

a = 3 # created a variable a and created an object that stands for 3 and then link them

An Object has two header fields: a type designator and a ref counter

the getrefcount function in the standard sys module returns the object’s reference count import sys sys.getrefcount(1) #this gives how many ref are pointing to the integer object 1 in the IDLE GUI

Shared Object

L1 = [1,2,3] L2 = L1 L1[0] = 2 L2 # gives [2,2,3] because both L1 and L2 are pointing the same object. L1[0] changed the object itselt
L1 = [1,2,3] L2 = [:] #this would make a copy. And if L1[0] = [2], this won’t change L2
But this sliceing techniq won’t work on othre mutable types (e.g dict, sets because they are not sequences)
To copy a dictionary or sets, we can call X.copy() method call Or a standard library module copy also does the job
- import copy # this works for dictionary and sets

X = copy.copy(Y) Y = copy.deepcopy(Y)

Check Equality

L = [1,2,3] M = L L == M #Same value (this returns true) L is M #is opeartor tells you if the two handles are pointing to the exact same object (strong, rarely used)
L = [1,2,3] M = [1,2,3] L == M #True L is M #False because the objects are differnent in spite of the same value

Weak Reference

use weakref standard library module
Weakref prevent the target object from being reclaimed.
Useful when we are having caches of large object

Python Statement

Statement	Role	Example
Assignment	create reference	a,b = ‘good’,’bad’
if/elif/else	condition
for/else	iteration
while/else	loop
pass	Empty placeholder	while true: pass
break	loop exit
continue	loop continue
def	functions and methods	def myFun(a,b,c=1,*d):
return	function result
yield	Generator functions
global	Namespaces
nonlocal	Namespaces
try/except/finally	catching exceptions
raise	trigger exceptions
assert	debug checks	assert X>Y, ‘X too small’
with/as	context managers
del	delete references	del data[i:j]

Python allows we ignore parenthese in if x<y case (less typing is always good)

but if we make the staetment multiple lines, then ( ) are required
- X = (A + B + C + D)
- if (A==1 and B==2 and C==3): print(‘spam’*3)

input (raw_input() in 2.X) built-in function takes an optional string in the argument as prompt and read from console input

use string.isdigit() to tell if the string contains numerical character or letter characters

while True: reply = raw_input(‘Enter text:’) if reply == ‘stop’: break elif not reply.isdigit(): print(‘bad’*8) else: print(int(reply) **2) print(‘Bye’)

try and except and else

python runs try first, then run either except part or else part (no exception triggered)
- try: num = int(reply) except: print (‘Bad’*2) else: print(num**2)

Assignment Statement Forms

When there are multiple items on the LHS of ‘=’, the assignments are positional
In Python 3.X, sequence Assignments are introduced so a,b,c = “spam” is allowed. In Python 2.X, only a,b,c = “spam”,”s”,”c” is allowed
Multiple Target assignment are not mutually connected:
- a = b= 0 b = 1 (a,b) #(0,1)

Operation	Interpretation
spam,ham=’ym’,’yd’	Tuple assign
[spam,ham]=’dy’,’dn’	List assign
spam = ham = ‘lunch’	multiple target

Print

Print is built-in function in Python 3.X Print is a statement with its syntax its own in Python 2.X

Print in Python 3.X

print([object, …][, sep=’ ‘][, end=’\n’][, file=sys.stdout][, flush=False])
sep is what to print between each two object in the first list argument
print(‘spam’,’99’,’eggs’) #spam 99 eggs #sep=’ ’ space by default
print(‘spam’,’99’,’eggs’,sep=’, ‘) #spam, 99, eggs
print(‘spam’,file=open(‘data.txt’,’w’)) #write spam to an output file data.txt in the dir where the script is running

Print in Python 2.X (note that everying in 3.X print could be converted to Python 2.X to do the same thing)

print x,y #note: print(x,y) would give a tuple (1,2)
print x,y, # same as Python 3.X print(x,y,end=’ ‘)
print x+y
print >> log, x,y,z # == print(x,y,z,file=log)
print ‘%s…%s’ % (x,y)

Use print(‘hello world’) in Python 3.X and print ‘hello world’ in 2.X

print is basically the same as: import sys; sys.stdout.write(‘hello_world\n’)

print(x,y) #or print x,y in 2.X
is same as import sys sys.stout.write(str(x)+’ ‘+str(y) + ‘\n)

Support Python 3.X print in Python 2.X

add to Top: from __future__ import print_function

Knowing this would allow us to redirect print to any arbitrary ways because print just called sys.stdout.write

import sys temp = sys.stdout #important!! otherwise we can’t restore to stdout after redirection sys.stdout = open(‘log.txt’,’a’) #redirect print to a file now. It could also be a GUI window or others print(‘spam’) #now the print goes to the file instead of stdout sys.stdout.close() sys.stdout = temp #remember to restore the print to stdout

Boolean

All objects have an inherent boolean value
Any nonzero nubmer or nonempty object is true
Zero number, empty object and none are false

if/else/and/or

if/else Ternary Expression (A=Y if x else Z)

same as if x: A = Y else: A = Z

and or operation A = ((X and Y) or Z) #assign Y if Y is not empty or Z

Notice X and Y returns Y if X is not empty
X and Y could be any objects (they are true as long as X and Y are not empty or none)
Useful when used as non-empty condition: Both X and Y have elements (assuming they are lists or dict), true. Or Z has element(s)

X = A or B or C or None #assing X to the first nonempty object among A and B and C

X = A or Defualt #is a very common use

while/for/range/zip/map

while loops

while/else

With loop else clause, the break statement can often eliminate the need for searching status flags used in other languages
while test: statement else #run if while didn’t exit loop with break statement
Find if the given number is a prime number x = y //2 #// is integer division (no remainder) this is to make sure y>1 while x > 1: if y%x ==0: print (y, ‘has factor’,x) break x-=1 else: print(y, ‘is prime’)

break/continue/pass/loop else block

break #jump out of the loop instantly
continue #go to next iteration and stop current iteration
pass #does nothing
loop else block #only executed if the loop was exited normally (without using break)

for loops

General format: for target in object

for target in object: statement else: #executed if for loop didn’t hit break statement

for loops also has else clause for not hitting break case

Data types for iterating (all iterables)

for x in “lumberjack”: #string
for x in [1,2,3] #list
for x in (1,2,3) #tuples
for x in [[1,2],3,4,5] #x would be a list [1,2] for the first iteration
for (a,b) in [(1,2),(3,4),(5,6)]
D = {‘a’:1,’b’:2} #notice that when using ‘in’ on a dict, it iterates through its keys for key in D: print(key,’=>’,D[key])

range(i,j,s=0) #returns a sequence in a list from i to j, s is space (every s items)

for i in range(0,10,2): #[0,2,4,6,8]
S = ‘abcdefg’ for c in range(0,len(S),2): print(S[i],end=’ ‘) #gives a c e g

zip and map

zip: takes two or more iterables and zip them together in tuples (in python 3+, returns a zip object)

zip([1,2,4],[3,4,5],[8,9,7]) #gives [(1,3,8),(2,4,9),(4,5,7)]

map(callable,iterable,iterable…) #takes each item from all iterables and pass as one argument of the callable (python 3+ returns a map object)

returns a list in python 2.6
map(lambda x,y,z:max(x,y,z), [1,2],[3,4],[6,7]) #[6,7]

One case to use map as zip: map(None,[1,2],[3,4]) #will return [(1,3),(2,4)] (ZIP is a special case of map!)

built-in method enumerate gives a for loop a counter “for free”

enumerate function returns a generator object
S = ‘spam’ for (offset,item) in enumerate(S): print(item, ‘appears at offset’,offset) #s appears at offset 0 …
Another Example elements = (‘foo’,’bar’,’baz’) for count,ele in enumerate(elements): print count,ele #give 1,foo 2,bar 3,baz

Iterations and Comprehensions

Built-in function iter

get an iterator from an object L = [1,2,3] i = iter(L) i.__next__() #returns the first element i.__next__() #returns the second element…
Note file object is the iterator itself. So no need to find the iterator of a file f = open(‘a.txt’,’r’) iter(f) is f #returns True f.__next__() #returns the first line of the file

Use iterator to iterate

I = iter(L) #find the iterator of a list L （or this could be a dictionary) while True: try: X = I.next(I) #I.__next__() in Python 3.X except StopIteration: break print(X ** 2 , end=’ ‘)

Comprehensions: Detailed look

Comprehension runs faster than just composing the list through for loops (typically 2x faster)

It’s very common to use comprehensions in reading files (remove \n,replace,split,upper,lower…):

f = open(‘d.txt’) lines = f.readlines() #read all lines and each line is an element in list lines lines = [line.rstrip() for line in lines] #running each line and remove \n character for each line
lines = [line.rstrip() for line in open(‘d.txt’)] #alternative and more NIUBI way
lines = [line.upper() for line in open(‘d.txt’)]
lines = [line.rstrip().upper() for line in open(‘d.txt’)
lines = [line.split() for line in open(‘d.txt’)]
lines = [line.replace(’ ‘,’!’) for line in open(‘d.txt’)] #replace space with !
[(‘sys’ in line, line[:5]) for line in open(‘d.txt’)] #returns if each element contains ‘sys’ for 0-4

if clause can be used in comprehensions to filter out interesting items

lines = [line.rstrip() for line in open(‘d.txt’) if line[0] == ‘p’]
[line.rstrip() for line in open(‘d.txt’) if line.rstrip()[-1].isdigit()]

Dictionary comprehensions

sq = {x: x*x for x in range(10)}

Nested for loops in comprehensions

[x+y for x in ‘abc’ for y in ‘lmn’] #[‘al’,’am’,’an’,’bl’,’bm’,’bn’,’cl’,’cm’,’cn’] Notice y is the inner loop

Built-in Functions: sum,any,all,max,min

any(list) #return True if any one element is bool(x) True

any([1,”]) #returns True
- all(list) #return True if all elements are boolean True
all([1,”]) #return False
- max(open(‘d.txt’)) #returns the line with max string length

Built-in Function: Filter in Python 3.X

returns items in an iterable for which a passed-in function returns True

list(filter(bool, [‘spam’,”,1])) #returns [‘spam’,1] and ” got filtered out since passing it to bool gives False

returns a list of numbers > 0 only:

l = list(range(-5,5)) positive = filter((lambda x:x>0),l)

Built-in Function: reduce in Python 2.X but in functools module in 3.X

from functools import reduce

reduce((lambda x,y:x+y),[1,2,3,4]) #10

Multiple Versus Single Pass Iterator

R = range(3) next(R) #won’t work because R is not an iterable. It is a list
R = range(3) I1 = iter(R) next(I1)

Documentation Interlude

dir function grabs a list of all attributes available inside an object (returned as a list)

len(dir(sys)) #list how many attributes in sys object
len([x for x in dir(sys) if not x[0] == ‘_’]) #how many non underscore names; use startswith or endswith for strings instead of char
- or:

len([x for x in dir(sys) if not x.startswith(‘__’)])

doc

__doc__ function automatically finds documentation attached to the object and run for inspecting them
””” Module doc Words goes here “”” class…
import file print file.__doc__ #shows words above
Notice we could also use “” or ’ ’ other than tripple quotes

help: extract docstrings and accociated structural information and format them into nicely arragned report

Functions

Unlike C, def defined functions do not exist until Python reaches and runs the def

So It is legal to nest a def inside if statements (all functions are determines at runtime not compile time)

if switch: def myFunc(): return 1 else def myFunc(): return 50 … #later on func() #depending on switch, my might define the function myFunc differently

def creates a function object and assign it to the name

so we could even change the name of the function by re-assigning a name def func: return1 othername = func result = othername() #calss def func and returns 1

General form:

def name(arg1,arg2…): statment return value #returns None if value is omitted

`lambda` creates an object but returns it as a result (means: inline) (function could also be created using lambda instead of def)

`yield` sends a result object back to the caller, but remembers where it left off (These functions are generators)

This remember allows it to resume its state later, so that it could produce a series of results over time

`global` declares module-level variables that are to be assigned

By default all variables are local inside a function.
Using global allows the function to use out-of-scope variables (Python always looks up in scopes)

`nonlocal` declares enclosing function variables that are to be assigned (Python 3.X only)

allows enclosing functions to serve as a place to retain state – information remembered between function calls

return could return any object in a function. So we can return any number of objects by returning a tuple

def multiple(x,y): x = 2 y = [3,4] return x,y #return as tuple
a,b = multiple(1,2)

Scopes

Basics

by default, all variables declared in def are put into local scope unless specifically defined in other ways
If we need to use a variable outside (top hierachy), use global
If we need to assign a name that lives in an enclosing def, (Python 3.X), use nonlocal
- nonlocal has the same meaning as global. Except it is meant to reference nested def above instead of module’s variable
In-place change to objects do not classify names as locals
- If L is declared outside and now we are inside a def L = X #this creates a new local variable L, but: L.append(X) #won’t creates a new local. In-place change. Automatically use the global scope if L is not found in local

Rule of Thumb (LEGB rule)

Name assignment creates or change local names
Name reference searches in order: local, then enclosing functions(if any), global, then built-in (bottom up)
- X = 1

def func(Y): Z = X+Y #X is a global

X = 1

def func(Y): global X X = 992 #X is changed

Built-in (Python) encloses Global (Module) encloses Enclosing Function Locals encloses Local (function)
Cross File : each module (file) is a self-contained namespace
- #fist.py

X = 99 #second.py import first #use references a name in another file print first.X

When we need to change a global variable from another file, it’s best practice to create a function for better maintainance

#fisrt.py X = 99 def setX(new): global X X = new #second.py import first.py first.setX(30)

Arguments

Immutable arguments are passed by value

Mutable arguments are passed by pointer

Avoid mutable argument changes:

sometimes we pass an list as arg but we don’t want to alter the original copy
We can do this by copying the list: L = [1,2] changer_fun(1,L[:]) #this would create a copy
Another way is to cast the list to a tuple, which is going to be an error if changing it: L = [1,2] changer_fun(1,tuple(L)) #like const in C

Argument Matching Syntax: must in order of: positional, followed by keyword args, then by *name form, then name form (name form must be at last)

func(value) - Caller: Matched by Position

func(name=value) - Caller: Matched by Name

def(a,b,c): print a+b+c
f(c=3,b=2,a=1) #lol
Could also use mixed: f(1,c=3,b=2) #note order is important! must use positional, then keyword, then *name, then **name

func(*iterable) - Caller: Pass all objects in iterables as individual positional arguments (tuple)

The “*” star means to pack multiple separate arguments into one tuple func([1,2,3],4,[5,6,7]) def func(*iter): print (iter) #([1,2,3],4,[5,6,7])
star to the left of a list also unpacks the list into multiple items when passing to function a = [1,2,3] func(*a) #same as func(1,2,3)
collects any numbers of uncollected arguments in a tuple
def f(*eggs): print(eggs) #print all passed args
Basically, Python collects all arguments as a tuple, and assign it to iterable(egg in above e.g.)

func(**dict) - Caller: Pass all key/value paires in `dict` as indiviadual keyword arguments (dict)

works only for keyword argument
def f(**args): print(args) f(a=0,b=1) #{‘a’:1,’b’:2}

def func(name) - Function: Normal argument: matches any passed value by position or name

def func(name=value) - Function: Default argument value!!!!

def func(*name) - Function: Matches and collects remaining positional arguments in a tuple (tuple)

def func(**name) - Function: Matches and collects remaining positional argumetns in a dictionary (dict)

def func(*other,name) - Function: Arguments that must be passed by keyword only in calls (3.X)

def func(*,name=value) - Function: Arguments that must be passed by keyword only in calls (3.X)

Mix

must follow the order of positional, named, *arg, **dict forms
def f(a,*pargs,**kargs): print(a,pargs,kargs) f(1,2,3,x=1,y=2) #1 (2,3) {‘x’:1,’y’:2}
unpacking: unpack a tuple def f(a,b,c,d) : print(a,b,c,d) args = (1,2) args += (3,4) f(*args) #this works!

Be careful when dealing with default mutable objects

Default values of a function is saved at the time of def is evaluated for mutable objects

def saver(x=[]): #default is an empty list x.append(1)
saver() #[1] saver() #[1,1] saver() #[1,1,1]

To avoid this:

def saver(x= None): if(x==None): x = [] x.append(1)

Recursive

def sum(L): if not L: return 0 else: return L[0] + sum(L[1:])

Coding Alternatives in if/else

def mysun(L): return 0 if not L else L[0] + mysum(L[:])

lambda

lambda arg1, arg2,… argN: expression using arguments

lambda is an expression, not a statement

with def, functions must be created elsewhere of caller.
as an expression, lambda returns a value that can optionallybe assigned a name

lambda’s body is a single expression, not a block of statements

f = lambda x,y,z: x+y+z f(2,3,4)

lambda could have default values as well

x = (lambda a=1,b=2,c=3: a+b+c) x(2) #7

handy list of inline expression

L = [lambda x: x**2, lambda x: x**3] for f in L: print f(1) print L[0](2)

Generators (generations and comprehensions)

Format of Comprehension

[expression for target1 in iterable1 if condition1 for target2 in iterable2 if condition2 … for targetN in iterableN if conditionN]

map is twice faster than for loop. Comprehension is faster than map often!

because map and comprehension use C code and for loop uses PVM bytecode
consider using map and comprehension in loops for performance

Generation (Generators/yield)

Procrastination: Python supports generating results only when needed instead of all at once

Unlike normal def functions, generator functions suspends when a value is returned (yield). And it resumes for the next call from the last yield call

state retains (for local variables)

Generator functions are closely bounded with iteration protocol (iterator objects define a next method)

returns next object or raise StopIteration exception to end the iteration

To end the generation of values, functions either use a return with no value OR simply allow control to fall off the end

To use the geneartor

def gensquares(x): for i in range(x): yield pow(i,2)
num = gensqaures(4) next(num) next(num) #use try except to iterate

Generator Expression

can use G = (c*4 for c in ‘SPAM’) #use parenthesis for a generator function

list(generator) #can force a generator to produce all results

G = (c for c in ‘PAM’) list(G)

Notice that Generators (no matther func or expressions) are their own iterators: support just one active iteration

EIBTI (Explicit is better than implicit): don’t use generator in simple cases unless having a good reason

One situation is: if we want a very long list of result. Compute them all might take long time and comsume memory
Use generator to step through, which can reduce the memory footprint

Modules and Packages

Why Modules?

Modules provide an easy way to organize components by serving as a new namespaces (avoid name collision among codes)

Better code reuse

Better SYstem namespace partitioning

Implementing shared services or data across platforms

NOTE: “import” can only import modules (the file). It can’t import attributes (i.e. class,function,variables…) Use from…import for attributes

Cross-file module linking is not resolved until import statements are executed at runtime!!

#in a.py def func(text): print text
#in b.py import b b.func(“a”)

import serves two purposes: 1.identify the filename 2. it also becomes a variable assigned to the loaded module

How Imports Work: 3 steps - Find, Compile and Run

Find the Module’s file

Ideally we need to: import ./b.py but Python disallows this by using a standard module search path and known file types to locate

We sometimes still need to tell Python where to look up or to find the modules (files) (python searches from 1st to last)

The Home directory of the program

Python would search this dir first. So be careful not to override the same name as other modules/std lib

PYTHONPATH directories (if set)

We can set PYTHONPATH env variable to a customized path and start to put our source lib there

Santandard Library direcotries

The contents of any .pth files (if present)

Python allows users to add dir to the module search path by simply listing them one per line
All lines need to be in a text file whose name ends with a .pth suffix
This file needs to be placed at top of Python’s installation dir

The site-packages home of third-party extensions

See the list of sys.path to know all dir included (can be used to verify what I added)

By modifying sys.path list at run-time, we can change the search path for all future imports made in a program run

many web server program often requires this
a usual way: sys.path.append() or sys.path.insert

Python only import the first file it encounters in the dir

Compile it to byte code (if needed)

Once founded, Python next compiles it bo byte coe if necessary

At this time, Python firstly checks the timestamps (to see if bytecode is older than source code) (.pyc files are bytecode)

if it is older, Python recompile it
otherwise it skips the compilation
We could ship the Python program by only shipping the .pyc bytecode without sending the source!

Only imported python file will have .pyc files generated after import. TOP LEVEL FILE does not have a .pyc file!

because the top level file is not imported by other files
top level file’s bytecode is generated and discarded internally
So top level file is typically designed to be executed directly and not imported at all

If Python doesn’t have permission to create or write to .pyc file, it would just put it into memory and discard when done

Python 3.2 and later: Byte code is stead stored in a subdirectory named pycache

this helps reduce clutter in the source code directory

Optimized byte code files

use python -O flag for generating .pyo instead of .pyc byte code files for modules

Slightly faster than normal .pyc files (but still less frequently used. PyPy system provides more substantial speedups)

Run the module’s code and build the objects it defines

All def statements in a file will be run at import time to create functions and assign attributes, so they can be called later

If the imported file has some real work (print), it will show immediately at import time. (def functions are run to create objects)

so import basically will directly run the code in the imported file!!

import fetches entire module as a whole, while from fetches (or copies) specific names out of the module

double import won’t rerun the module. Instead, it fetches already loaded module from the memory

print (“hello”) spam = 1
#later in top import a #”hello” print a.spam # 1 a.spam = 2
import a # nothing on screen because a was already imported and will stay in the memory print a.spam #2 not 1 because the module is not re-run and the assignment didn’t play in effect!

from copies specific name from one file over to another scope. So we can use that name directly without going through the module

from module1 import pinter printer(“hello!”) #no need to add module1.printer !!!
This requires less typing
But be careful! This means implicitly define some new function names in current scope
It may corrupt current namespace! Using from is disallowed if in current scope, we have the same names with the target module
Using import is still recommanded

from module1 import * # copy out all variables into current scope… No need to call through the module

from module1 import * printer() #no need to add module1.printer because from makes this import operation copy......

Use reload(module_name) built-in function to re-run all the code in a module

long running applications (like server) can periodically update modules if something changed
Note that Python can only dynamically reload modules written by Python. Not C and other languages..
reload runs a module file’s new code and overwrites existing namespace instead of re-creating the object

Changing mutables in modules (same scheme in pass argument to a function)

#in a.py x=1 y = [1,2]
#in top.py from a import x,y x = 42 #change local copy only y[0] = 42 #changes shared mutable in place.

Module namespaces can be accessed via attribute dict or dir(M) #suppose module is named M

Packages search path setting:

if I were to import a package in sub-directories of the running dir, I’d need __init__.py file in sub-directories
This __init__.py file can contain python code just like normal module files. Code inside it will be run automatically the first time imported this dir
import dir1.dir2.mymod
Then we need the following file structure: dir0\ dir1\ __init__.py dir2\ __init__.py mymod.py
Once imported, the directory path becomes a handle pointing to the __init__.py object and the mod becomes the handle to the actual module dir1 #<module ‘dir1’ from ‘.\dir1\__init__.py’

import modulename as name OR from modulename import function1 as myFunc

same as: import modulename name = modulename del modulename
same as: from modulename import function1 myFunc = function1 del function1

OOP

Python has class object and instance object

Class object serves as the factory of instance objects

Class Objects

the class statement creates a class object and assign it to a name

Assignments inside the class statments make class attributes (not including the nested def in a def)

Instance Objects

They are concrete items

Calling a class object like a function makes a new instance object

Each instance object inherits class attributes and gets its own namespace

Assignments to attributes of self in methods make per-instance attributes

In class method: def func(self,a)

self means this method would process its instance object

Operator Overloading

Methods named with double underscores (X) are special hooks

Python defines a fixed and unchangeable mapping from each of these operations to a specially named method
such methods are called automatically

init, add, str

class myClass(firstClass): def __init__(self,value): #not to be confused with __init__.py file!!! Also, __init__ is overator overloading as well self.data = value def __add__(self,other): return myClass(self.data+other) def __str__(self): return ‘[ThirdClass:%s]’ % self.data def mul(self,other): #this mul didn’t overload the operator * self.data *=other

Attributes doesn’t have to be defined in the class to be used in an object! (quite different from other languages)

Instances have no attributes of their own at first. They simply fetch the attribute from the class object where it is stored
We can always assign unique attributes to an instance object
class rec: pass
a = rec() a.name = ‘bob’ print a.name #’bob’ ! We can always attach attributes to an instance object even these attributes are not defined in the class
So in some sense, class would create an empty namespace, which could even be used as a dictionary

dict is an built-in dictionary in an instance or class object that shows the namespace (attributes from the class not here!)

class is an attribute link to the instance’s class object

Inheritance

class A:

def __init__(self,name,val=0): self.name = name self.val = val def raise(val): self.val += val

class B(A):

def __init__(self,name,val): A.__init__(self,name,val) def raise(self,val,bonus): A.raise(self,val) #must remember to pass along the object self! self.val+=bonus

Multiple Inheritance might casue variable name conflict

conflict example

class A: def math(self,value): self.X = value class B: def math1(self,value): self.X = value class C(A,B) #only one X can be valid!!

Pseudoprivate Attributes: Use two underscore prefix but no end with two underscores ( useful in large project. Use cautious when MI)

Use two underscores prefix will automatically convert the name to _classname__name
class C1: def meth1(self, value): self.__X = 88
I = C1() I.meth1(88) print(I.__dict__) #_C1__X one underscore prefixed automatically
class Tool: def __method(self): #becomes _Tool__method pass

When Python searches methods, it chooses the first one it encounters (lowest and leftmost in classic classes) in conflict case

We may also select an attribute explicitly by referencing it through its class name

superclass.method(self) # this would break the conflict and overrides the search’s defualt

Department Example

class Person
class Manager(Person):
class Department: def __init__(self,*args): self.members = list(args) def addMember(self,person): self.members.append(person) def showAll(self): for person in self.members: print(person)

Classes have a name, just like modules, and a bases sequence that provides access to superclasses

object.dict attribute provides a dictionary with one key/value pair for every attribute attached to a namespace object (indluding class,obj,mod)

Operator Overloading

Common Operator Overloading Methods

Method	Implements	Called for
__init__	Constructor	X = class(args)
__del__	Destructor	Object reclamation of X
__add__	Operator+	X+Y, X+=Y if no __iadd__
__or__	Operator	(bitwise or)
__repr__, __str__	Printing, conversions	print(X), repr(X), str(X)
__call__	Function calls	X(args,*kargs)
__getattr__	Attribute fetch	x.undefined
__setattr__	Attribute assignment	X.any = value
__delattr__	Attribute deletion	del X.any
__getattribute__	Attribute fetch	X.any
__getitem__	Indexing,slicing,iteration	X[key],X[i:j], for loops and other iteration if no __iter__
__setitem__	Index and slice assignment	X[key] = value, X[i:j] = iterable
__delitem__	index and slice deletion	del X[key], del X[i:j]
__len__	length	len(x), truth tests if no __bool__
__bool__	boolean tests	bool(x), truth tests (named __nonzero__ in 2.X)
__lt__,__gt__	comparisions	< > <= >= == !=
__le__,__ge__
__eq__,__ne__
__radd__	Right-side Operators	Other + X
__iadd__	In-place augmented operators	X += (or else __add__)
__iter__,__next__	Iteration contexts	I = iter(X), next(I), for loops, in if no __contains__, all comprehensions, map(F,X), __next__ in 2.X
__contains__	Membership test	item in X (any iterables)
__index__	integer value (not index slice!)	hex(X),bin(X),oct(X)
__enter, __exit__	Context manager	with obj as var
__get__, __set__,__delete__	Descriptor attributes	X.attr, X.attr = value, del X.attr
__new__	Creation	Object creation, before __init__

repr and str overloading

str is called when the object is passed into print() or str()

repr is called when the object is passed to eval() and all other context. The returned string is for developer or Python internal use to convert to an object

When I create an object: myInst = myClass()
- myInst #will print whatever __repr__ returns
- print(myInst) # will print whatever __str__ returns if __str__ is explicited defined besides __repr__
Notice that if we only overload __repr__ without __str__, __str__ would be the same as __repr__ unless we define one explicitly
__str__ is for Users
__repr__ is for developers (for Python shell)
if __str__ is not defined, __repr__ is used in print() and str()

Intercepting Slices

The best way is to use getitem attributes (in both 3.X and 2.X)

L[2] will call getitem(self, index)

L[1:2] will return a slice object. So when needed, we can add a type test inside getitem to determine whether index or slice

A slice object has three attributes: start, stop, step
When L[1:2] is called, a slice object will be passed as the second argument to __getitem__ method
class Indexer:

data = [1,2,3,4] def __getitem__(self,index): if isinstance(index, int): #regular indexing return data[index] else: #slice return data[index.start,index.stop]

In Python 2.X, we can overload getslice(self,i,j) and setslice(self,i,j,seq)

This feature is removed in Python 3.X. So even in 2.X we should use __getitem__ and __setitem__ for both compatability

Notice that index is not indexing!!!! It returns a number when hex(), bin() and oct() is in the context call

class C: def __index__(self): return 255
X = C() hex(X) #0xff bin(X) #0b11111111

Index Iteration in for loops: getitem

Other intercepting slices and indexing, getitem also provides the way for use in for loops

Baically in for statment, it starts by indexing a sequence from 0 until it gets an exception

Same thing happens with getitem. In for statement, it starts iterating from passing 0 to getitem until hit an exception

class StepperIndex: def __getitem__(self,i): return self.data[i]
X = StepperIndex() X.data = “spam” for item in X: print(item,end=’ ‘)# s p a m

Iterable Objects: iter and next : more preferable than getitem in iteration

All iteration contexs in Python will try iter method first before trying getitem (prefer iter)

Typically __next__ method needs to be overloaded alone with __iter__ if the same class is returned as iterator
class Squares: def __init__(self,start,stop): self.value = start- 1 self.stop = stop def __iter__(self): #return self because the __next__method is part of this class itself. In more complex scenarios, it may return other class return self def __next__(self): if(self.value == self.stop): raise StopIteration self.value += 1 return pow(self.value,2)
for x in Squares(1,5): print x # 1 4 9 16 25

Multiple Iterators on One Object

Sometimes we need a seperate class to model the iterator instead of returning itself

class SkipObject:

def __init__(self,wrapped): self.wrapped = wrapped def __iter__(self): return SkipIterator(self.wrapped)

class SkipIterator:

def __init__(self,wrapped): self.wrapped = wrapped self.offset = 0 def __next__(self): if self.offset >= len(self.wrapped): raise StopIteration else: item = self.wrapped[self.offset] self.offset+=2 return item

iter with yield

no need to overload __next__
class Squares: def __init__(self,start,stop): self.start = start self.stop = stop def __iter__(self): for value in range(self.start,self.stop+1): yield pow(value,2)

Membership: contains, iter, and getitem (contain is prefered over iter, which is prefered over getitem in “in” context)

contains is preferred in in case: ‘s’ in obj

iter is preferred for iteration

getitem is fallback for iteration, also for index, slice

Attribute Access: getattr and setattr

getattr is called when a method is not defined in either class or its super classes

class Empty: def __getattr__(self, attrname): if attrname == ‘age’: return 40 else: raise AttributeError(attrname)
X = Empty() X.age #40 X.name #AttributeError: name

setattr is called if setattr is defined and when a new attibute is assigned outside of class (BUT! WATCH FOR INFINITE LOOP!)

Always set the new attribute through dict or its super class

class Control:

def __setattr__(self,attr,value): if attr == ‘age’: self.__dict__[attr] = value + 10 else: raise AttributeError(attr+” not allowed”)

if not use __dict__ to set the new attribute, an infinite recursive call will happen because when X.age = 40 is called, python

calls Control’s __setattr__ and in the __setattr__, it calls X.age again, which calls __setattr__ again.....

delattr is similar to setattr (watch for the infinite loop)

is called if del object.attr is called outside the class

Right-side addition (and other similar operators)

example

class adder: def __add__(self,val): return self.val+val def __radd__(self,val): self.__add__(self,val)
#or: __radd__ = __add__ #cut of the middle man

In-place Addition +=

Example:

class adder: def __iadd__(self,val): self.val += val return self

Make calling an object possible: call

By overloading call, we can allow outside world to actually “call” the object…

class Callee: def __call__(self,*arg,**argv): print(“called”,arg,argv
c = Callee() c(1,2,3,x=2,y=7) #called (1,2,3)(“x”:2,”y”:7) #notice function arg transfer an iterable into a tuple and pass to *arg
class Acallee: def __call__(self,*parag,sep=0,**argv): #3.x keyword only!! pass #do something here like print in python 3.0

Comparison: lt,gt,ne,eq, cmp (removed in python3.X)

In Python 2.X, cmp is the fallback of other comparison operators

bool and len: USE bool IN 3.X and nonzero IN 2.X!!!!!!

In bool context, Python firstly tries bool, if not defined, Python then tries len (Python 3.X)

Python prefers bool over len!!!

In Python 2.X, it uese nonzero instead of bool. 3.X simply renamed nonzero to bool. len is still the fallback

If we use bool in Python 3.X, it is silently ignored!!!! What a punk

Bound or Unbound (first glance at static methods in a class): Python 2.X doesn’t support unbound call

Unbound class method objects: no self

Python 2.X does not allow calling this method without passing an instance

Python 3.X allows calling this method through class instance

Example

class Selfless: def selfless(arg1,arg2): return arg1+arg2
X = Selfless() Selfless.selfless(1,2) #works in 3.X ONLY! Failed at 2.X

Bound class method objects: with self

Python automatically pair the instance object to the first argument of the bound method under class instance

Class Factory (assume I am already familiar with the usage of factories like UVM)

Example

def factory(aClass,*pargs,**kargs): return aClass(*parags,**kargs)
object1 = factory(Person, “Arthur”,”King”) object3 = factory(Person,name=’Brain’)

New Style

In Python 2.X, all object instances are the same type! But not in Python 3.X

class A: pass class B: pass
a = A() b= B() type(a) == type(b) # True in 2.X and false in 3.X: In 3.X, Python actually compare the class difference
In python 2.X: needs to do type(a.__class__) == type(b.__class__)

Changes

All new-style classes inherit from object in Python 2.X. In 3.X, this is added automatically above the user-defined root

Minor changes are not listed in this doc

New-style classes have new tools: slots, properties, descriptors, super and getattribute method

getattribute is not the same as getattr, as it is called for every attribute call (watch for infinite loop when modifying this)

slot can limit new attributes to be added to the class - add in the top class

Python reserves just enough space in each instance to hold a vlaue for each slot attribute - save memory!
Best to use for rare cases where large numberes of instances in a meory-critical application
class limiter(object): __slot__ = [‘age’,’name’,’job’] #only names in __slots__ list can be assigned as intance attributes
Slots in subs are pointless when absent in supers
Slots in supers are pointless when absent in subs
Slots typically needs to include __dict__

Static Methods

Python has 3 types of methods:

instance method: pass a self
static method: no instance passed
class method: gets a class, not instance
class Methods: def imeth(self,x): #instance method pass def smethod(x): #static: no instance print([x]) def cmethod(cls,x): #class: get class, passing the class to the first argumetn is automatically done! print([clk,x])
smethod = staticmethod(smethod) #make smethod a static method cmethod = classmethod(cmethod) #make cmethod a class method

In Python 2.X

Fetching a method from a class produces an unbound method, which cannot be called without manually passing an instance

We must ALWAYS declare a method as static in order to call it without an instance, whether it is through a class or instance

In Python 3.X

Fetching a method from a class produces a simple function,which can be called normally with no instance present

We need NOT declare such methods as static if they will be called through a class only, But we MUST do so in order to call them through an instance

A note about super()

One of the biggest downside in 3.X in using super

Calling super() in a method of a subclass will inspect the call stack in order to automatically locate self argument and find the superclass

Then it pairs the two in a special proxy object that routes the later call to the superclass version of the method. So no need to pass self in super

e.g. self.class.bases[0] # this is a violation of undamental Python idiom for a single use case!

So in MI, super() will only calls one of the super class’s method if both exist

class C(A,B): def act(self):

Limitation: Operator overloading

We could use super() to call super class’s x methods. But note that direct operators do not work

class D(C): def __getitem__(self,ix): C.__getitem__(self.ix) #This works super().__getitem__(ix) #This works too and no need self because Python automatically checks for it super()[ix] #THIS WON”T WORK!

Complex use if in 2.X Python

class D(C): def act(self): super(D,self).act() #Too complex to use… But this is compatable in 3.X

When to use super() is good??? But still using it will increase the complexity of maintaining codes

Runtime clalss Changes

Superclass that might be changed at runtime dynamically preclude hardcoding their names in a subclass’s method

But super() will happily look up the current superclass dynamically

Rare case this is.

class X:

def m(self): print(‘X.m’) class Y: def m(self): print(‘Y.m’) class C(X): def m(self): super().m()

i = C()

i.m() #call X’s m method C.__bases__ = (Y,) #changing superclass at runtime!! i.m() #call Y’s m method

Cooperative Multiple Inheritance Method Dispatch (see book. Not studied)

use property to intercept attribute calls (get,set,del,doc)

property allows us to route a specific attribute’s get,set and delete operations to functions or methods we provide

attribute = property(fget,fset,fdel,doc) #default is None if any one is not passed

class Person: def __init__(self, name): self._name = name def getName(self): print (‘fetch…’) return self._name def setName(self,value): print(“change…’) self._name = value def delName(self): print (‘remove…’) del self._name name = property (getName,setName,delName,”name property docs”)

Exceptions

try statement clauses

Clause Form	Interpretation
except:	catch all (or all other) exceptions types
except name:	catch a specific exception type only
except name as value:	catch a specific excpetion and assign its instance
except (name1,name2):	catch any listed exception types
except (name1,name2) as value	catch any listed exception types and assign its instance
else:	Run if no exceptions are raised in the try block
finally:	always perfrom this block on exit

use Exception as to print the exception instance explicitly

try: 1/0 except Exception as X: print X #ZeroDivisionError(‘integer division or modulo by zero’)

Avoid except but no type

avoid this

try: #do something except SomeExcpt: #do something except: #BAD

except all other might catch genuine programming mistakes for which we want to see as an error message

Even ctrl+c will trigger exceptions. We probably don’t want to catch that, which could make the program unstoppable

In Python 3.X, we can use except Exception: to catch all possible exceptions, except exits!

try: action() except Exception: #catch all exceptions, except exits other_action()

Exception class excludes SystemExit, KeyboardInterrupt and GeneratorExit)

User Defined Exceptions: inherits from Exception class

class AlreadyGotOne(Exception): pass
def grail(): raise AlreadyGotOne()

try: grail() except AlreadyGotOne as X: print(‘got exception: AlreadyGotOne’) print(‘caught : %s ’ % X.__class__) else: print “Good without any exceptions!”

try/finally

Code in finally will always been executed regardless of whether exceptions were raised in try block

This could be used to ensure that server shutdown code is run when an exception occurs and program can exit safely with server shut down

Example

with open(‘lumberjack.txt’,’w’) as file: #always close file on exit file.write(“the larch”)

raise

Propagating Exceptions with raise (raise most recent exception)

try: raise IndexError(“spam”) except IndexError: print (‘propagating’) raise #raise most recent exception

Python 3.X Exception Change: raise from

try: 1/0 except Exception as E: raise TypeError(‘Bad’) from E

assert statement

assert test, data #data part is optional

the same thing as

if __debug__: if not test: raise AssertionError(data)

Use exception hierachies solves the delimma of maintaining manual exceptions

class NumErr(Exception): pass class DivZero(NumErr): pass class Oflow(NumErr): pass
def func(): … raise DivZero() #note: needs to create an instance object! don’t just raise the class

Some built-in exceptions

Exception class excludes SystemExit, KeyboardInterrupt and GeneratorExit)

ArithmeticError: is super class of OverflowError, ZeroDivisionError and FloatingPointError

LookupError: is super class of IndexError and KeyError as well as some Unicode lookup errors

Decorators and Metaclasses

Function decorators - specifies special operation modes by wrapping functions in an extra layer of logic implemented as another function (metafunction)

Syntax: the following is essentially the same

@staticmethod def meth(): pass
def meth(): pass meth = staticmethod(meth)

Function decorators allow us to change (add) behavior of a function (or callback)

 def my_decorator(some_func):
    def wrapper(): #wrapper now replaces some_fun function. calling this one indeed
       num = 10
	  if num == 10:
	     #do something
	  some_func()
	  print("callback can be done here too")
    return wrapper #note here we need to return the function instead of calling it!

 @my_decorator
 def some_fun():
    pass

 some_fun() #this is equivalent to calling wrapper() above

Function Tracer example

class Tracer: def __init__(self,func): self.calls =0 self.func = func def __call__(self,*args): self.calls += 1 print (‘call %s to %s’ % (self.calls, self.func.__name__)) return self.func(*args)
@Tracer def spac(a,b,c): return a+b+c #spac = Tracer(spac) : #call 1 to spac\n 6

print(spac(1,2,3)) #since spac now is an instance object of Tracer, spac(1,2,3) would invoke __call__

Class decorators - adds support for management for whole objects and their interfaces (often called metaclasses)

The decoration process call init in the decorator class, then immediately calls call

When users pass arguments to the decorator (e.g @decorator_with_arg(“hello”,1,2)), the function to be decorated is not passed to the constructor!

class decorator_with_arguments(object):
    def __init__(self,arg1,arg2,arg3):
       print("inside __init__") # this happens during decorating process
       self.arg1 = arg1 #this is from the decorator argument
       self.arg2 = arg2 #this is from the decorator argument
       self.arg3 = arg3 #this is from the decorator argument
    def __call__(self,f):
       print("inside __call__")  #this happens during decorating process
       def wrapper(*arg):
          print("inside wrap")   #this happens after decorating process and when the target function is called! These are from the real argument
	  print("decorator arguments:",self.arg1,self.arg2,self.arg3)
          f(*args)
       return wrapper

@decorator_with_argumetns("hello","world",32)
def hello(a1,a2,a3,a4):
   print("sayHello argument:",a1,a2,a3,a4)
print("after decoration)

hello("say","hello","argument","list")

Augment the classes with instance counters and any ohter data required

def count(aClass): aClass.numInstances = 0 return aClass
@count class Spam: …

@count class Sub: …

@count class Other(Sub) …

with/as: Context Managers (as is optional)

Designed to work with context manager objects

Code run inside with block will be guarenteed to run regardless of exceptions

Syntax

with expression [as variable]: with-block
expression here is assumed to return an object that supports the context management protocol
This object may also return a value that will be assigned to the name variable if the optional as clause is present

The Context Management Protocol (customized context manager)

An object known as a context manager must have enter and exit methods

enter is called and the value it returns is assigned to the variable in the as clause if present, or discarded otherwise

After enter is executed, code in the nested with block is executed

If an exception is raised in with block, exit(type,value,traceback) method is called with the exception details

(type,value,traceback) are the same three values returned by sys.exc_info

If no exception raised, exit(None,None,None) will be called

Multiple Context Managers in 3.1,2.7 and Later

with open(‘data’) as fin, open(‘res’,’w’) as fout: for line in fin: if ‘some key’ in line: fout.write(line)

Unicode and Byte Strings

Encoding is the process of translating strings into raw bytes in targeting format

Example

S = ‘ni’ S.encode(‘utf16’),len(S.encode(‘utf16’) #(b’\xff\xfen\x00i\x00’,6)

After encoding, the string (in bytes) can be then written to external files

Decoding is the process of translating raw bytes to strings in Python

type: bytes and bytearray (mutable)

Use b’xxx’ to use bytes

Byte is used for image/audio/other pure binary data that shouldn’t be encoded (Only ASCII)

Used by file I/O opened by wb, rb…

Decorators: A decorator itself is a callable that returns a callable

Function decorators can be used to manage both function calls and function objects!

Decorators are free to return either the original class or an inserted wrapper object

Class decorators can be used to manage both class instances and classes themselves

A decorator itself is a callable that returns a callable!

Basic Usage: name rebinding and use a class decorator to wrap

exp: function decorator

@decorator def F(arg): …
#same as F = decorator(F)(arg)

exp: using a class to wrap functions

class decorator: def __init__(self,func): self.func = func def __call__(self,*args): #use self.func and args
@decorator def func(arg): #do something

class decorator: commonly coded as factory

exp

def decorator(cls): class Wrapper: def __init__(self,*args): self.wrapped = cls(*args) def __getattr__(self,name): #this basically intecepts any attributes get operations! return getattr(self.wrapped,name) return Wrapper
@decorator class C: def __init__(self,x,y): self.attr = ‘spam’

x = C(6,7) #really calls Wrapper(6,7) print(x.attr) #runs Wrapper.__getattr__, prints ‘spam’

we can add multiple layers of decoratos on top of a function or class

exp

@A @B @C def f(…):
def (f…): #same as f = A(B(C(f)))

Some Usage Examples

Tracing Calls

code #this case no need to return callables since __call__ intercepts calls and call it for you class tracer: def __init__(self,func): self.calls = 0 self.func = func def __call__(self,*args): self.calls += 1 print(‘call %s to %s’ % (self.calls,self.func.__name__))
@tracer def spam(a,b,c): print(a+b+c)

@property decorator (very important so seperate item)

Must inherit “object” in Python 2.7

We don’t want to directly access a property inside a class.

Could be err prone

Can’t check boundaries or type

Use @property to convert a method to a property. @property decorate will create a new property @score.setter

exp

class Student(object):

@property def score(self): return self._score

@score.setter def score(self,value): if not isinstance(value,int): raise ValueError(“score must be an integer”) if value <0 or value >100: rasie ValueError(“score must between 0-100”) self.score = value

@property enhances code stability and maintainence. Good for encapsulation

launch shell command/subprocess

subprocess.call(‘echo $HOME’, shell=True) #shell == true makes subprocess to

A quick way to launch shell cmd and get return string

proc = subprocess.Popen(["cat",'/tmp/bax"],stdout=subprocess.pipe)
(out,err) = proc.communicate()

itertools - module that implements a number of iterator building blocks to improve memory efficiency and speedup execution time

In python2, functions like zip, map returns list. We must use itertool to return iterators on those. In python3, by default zip/map return iterators (using itertools!)

itertools.accumulate(iterable[,func])

make an iterator that returns accumulated sums or results of other binary functions (in func if specified)

if func is supplied, it should be a function of two arguments. elements of the input iterable maybe any type

Roughly equivalent to

def accumulate(iterable,func=operator.add):
   it = iter(iterable)
   try:
      total = next(it)
   except StopIteration:
      return
   yield total
   for element in it:
      total = func(total,element)
      yield total

e.g: input=[1,2,3,4], return [1,3,6,10,15] if no func is specified

itertools.chain(*iterables)

make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all are exhausted

Notice the input argument is *iterable instead of a single list or tuple!

So only works when multiple arguments are there and they are iterable - itertools.chain(‘abc’,’def’) - > a,b,c,d,e,f

It won’t work if itertools.chain([‘abc’,’def’]) !!!!! Use itertools.chain.from_iterable

e.g: input = (‘ABC’,’DEF’) -> A B C D E F

an iterator of iterator kind of thing

itertools.combinations(iterable,r) - return r length subsequences of elements from the input iterable

e.g: combinations(‘ABCD’,2) -> AB AC AD BC BD CD

itertools.combinations_with_replacement(iterable,r)

return r length subsequences of elements from the input iterable allowing individual elemnts to be repeated more than once

itertools.permutations(iterable,r=None)

Return successive r length permutations of elements in the iterable

permutations(‘ABCD’,2) ->AB AC AD BA BC BD CA CB CD DA DB DC

itertools.compress(data,selectors)

Make an iterator that filters elements from data returning only those that have acorresponding element in slectors that evaluates to True.

Stops when either data or selectors iterables has been exhausted

compress(‘ABCDEF’,[1,0,1,0,1,1]) -> A C E F

Roughly equivalent to

def compress(data,selectors):
   #compress('ABCDEF',[1,0,1,0,1,1]) -> A C E F
   return (d for d,s in zip(data,selectors) if s)

itertools.count(start=0,step=1)

Make an iterator that returns evenly spaced values starting with number “start”

Often used as an argument to map() to generate consecutive data points.

Also used in zip() to add sequence numbers.

count(10) -> 10,11,12,13…

count(2.5,0.5) -> 2.5,3.0,3.5 …

itertools.cycle(iterable)

Make an iterator returning elements from the iterble and saving a copy of each.

When the iterable is exhausted, return elements from the saved copy. Repeats indefinitely.

cycle(‘ABCD’) -> A B C D A B C D…

itertools.dropwhile(predicate,iterable) - produce output until predicate firstly becomes false

Make an iterator that drops elements from the iterable as long as the predicate is true

Afterwards, returns every element.

Note: the iterator does not produce any output until predicate first becomes false. So this might have a lengthy start-up time

dropwhile(lambda x:x<5,[1,4,6,4,1]) -> 6,4,1

itertools.filterfalse(predicate,iterable)

Make an iterator that filters elemtns from iterable returning only those for which the predicate is false.

if predicate is None, return the items that are false.

filterfalse(lambda x: x%2, range(10)) -> 0 2 4 6 8

itertools.groupby(iterable,key=None)

Make an iterator that retuns consecutive keys and groups from the iterable. The key is a function computing a key value for each element

if not specified or is None, key defaults to an identity function and returns the element unchanged.

groupby is silimiar to Unix “uniq”.

[k for k,g in groupby(‘AAAABBBCCAABBB’)] -> A B C D A B

[list(g] for k,g in groupby(‘AAAABBBCCD’)] -> AAAA BBB CC D

return a “list(in fact a groupby object)” of tuples. each tuple’s first element is the unique char and the second element is the iterator of exact number of char

>>> a = it.groupby(‘AAAABBBCCD’) >>> list(a) [(‘A’, <itertools._grouper object at 0x10b5a7b70>), (‘B’, <itertools._grouper object at 0x10b5a7ba8>), (‘C’, <itertools._grouper object at 0x10b5a7be0>), (‘D’, <itertools._grouper object at 0x10b5a7c18>)]

itertools.islice(iterable,stop) & itertools.islice(iterable,start,stop[,step])

returns selected elements from the iterable

islice(‘ABCDEFG’,2) -> A,B

islice(‘ABCDEFG’,2,4) -> C,D

islice(‘ABCDEFG’,2,None) -> C,D,E,F,G

islice(‘ABCDEFG’,0,NONE,2) -> A,C,E,G

itertools.product(*iterables,repeat=1)

Cartesian product of input iterables

roughly equivalent to (x,y) for x in A for y in B

product(‘ABCD’,’xy’) ->Ax Ay Bx By Cx Cy Dx Dy

product(range(2),repeat=3) -> 000 001 010 011 100 101 110 111

itertools.repeat(object[,times])

Make an iterator that retuns object over and over again (run indefinitely unless the times argument is specified.

itertools.starmap(function,iterable)

Make an iterator that computes the function using arguments obtained from the iterable

Used instead of map() when argument parameters are already grouped in tuples from a single iterable

Used in parallel with map(). The distinction is like function(a,b) and function(*c)

starmap(pow,[(2,5),(3,2),(10,3)]) -> 32 9 1000

itertools.takewhile(predicate,iterable)

Make an iterator that returns elements from the iterable as long as the predicate is true.

Once predicate becomes false, we stop

takewhile(lambda x: x<5,[1,4,6,4,1]) -> 1 4

itertools.tee(iterable,n=2)

Return n independent iterators (in a tuple) from a single iterable

c,d=itertools.tee([1,2,3],2) - then c,d and iterate through the list independently

itertools.zip_longest(*iterables,fillvalue=None)

Make an iterator that aggregates elements from each of the iterables. if the iterables are of uneven length, missing values are filled with fillvalue

Iteration continues until the longest iterable is exhausted

zip_longest(‘abcd’,’xy’,fillvalue=’-‘) -> ax by c- d-

collections - module that implements high-perf containers alternatives to Python built-in dict,list,set,tuple

namedtuple(typename,filed_names[,verbose=False][,rename=False])

returns a new tuple subclass named typename.

The new subclass is used to create tuple-like objects that have fields accessible by attribute lookup AND being indexable and iterable

if rename is true,, invalid fieldnames are automatically replaced with posistional names.

Named tuples are especially useful for assigning field names to result tuples returned by csv or sqlite3 modules

a = namedtuple(‘Point’,[‘x’,’y’],verbose = True)

p = Point(11,y=22)

p[0] + p[1] -> 33

x,y = p -> unpacked

p.x+p.y -> 33

deque(iterable[,maxlength])

If iterable is not specified, deque is empty

Thread-safe,memory efficient appends and pops form either side with approx (O(1))

methods

Methods	Description
append(x)	add x to the right side
appendleft(x)	add x to the left side
clear()	remove all elements
count(x)	count number of deque
extend	extend the right side by appending from iterable
extend_left(x)	extend the left side by appending from iterable
pop()	remove and return from right
popleft()	remove and return from left
remove(value)	remove the first occurance of value. if not found, raise ValueError exception
reverse()	reverse the elements of hte deque in-place and return None
rotate(n=1)	rotate the deque n setps to the right. if n is negative, to the left

Counter

a tool is provided to support convenient and rapid tallies

cnt = Counter()

e.g

cnt = Counter()
for word in ['red','blue','red','green','blue','blue']:
   cnt[word]+=1
cnt # Counter({'blue':3,'red':,'green':1})

methods

Methods	Description
elements()	return an iterator over elements repeating each as many times as its count	#[‘a’,’a’,’a’,’b’,’c’,’c’] if Counter(a=3,b=2,c=2,d=0)
most_common()	return a list of n most common elements and their counts
subtract([iterable-or-mapping])	elements are subtracted from an iterable or from another mapping
update([iterable-or-mapping])	elemetns are counted from an iterable or added-in from another mapping

OrderedDict([item])

Remember the order of elements being added. If overwrite, the original order is unchanged.

OrderedDict.popitem(last=True) - new method for ordered dict, returning and remove the specified key

defaultdict

new dict-like object that overrides one method and adds one writable instance variable.

if an entry is not created yet, the default_factory will create a data type automatically ready to be used directly

d[5] #won’t give error is no-exist. Will create an empty list for d[5] if default_factory is list

s = [('yellow',1),('blue',2),('yellow',3),('blue',4),('red',1)] #we want each element of the dict to be a list of all numbers shown indexed by color
d = defaultdict(list)
for k,v in s:
   d[k].append(v)
d.items() #[('blue',[2,4]),('red',[1]),('yellow',[1,3])
* Python for Data Analysis - at page 164
** ipython - an enhanced Python Interpreter, which allows to explore the results interactively when a script is done executing
*** Tab completion (objects, functions, etc)
*** Introspection - use question mark (?) after a variable will display some general information about the object (will even show docstring for functions)
b?
#Type: list

%run command - run any file as a Python program inside the env of IPython session

%run ipython_script_test.py

Use %run -i instread of plain python will give a script access to variables already defined in the interactive IPython namespace

Magic commands - any special commands prefixed by symbol %

%timeit - check the execution time of any Python statement, such as a matrix multiplication

In [20]: a = np.random.randn(100,100)
In [20]: %timeit np.dot(a.a)
100000 loops, best of 3: 20.9 us per loop

%time statement - report execution time of a single statement

%debug (use %debug? to view its doc string) - Activate the interactive debugger (two modes)

mode1 - activate debugger before executing code. This way, we can set a breakpoint to step through code from the point

mode2 - activate debugger in post-mortem mode (can run without argument)

if an exception occurs, this lets you inspect its stack frames interactively

%pdb - inspect stack frames automatically when exception occurs

%pwd - view current path

%paste - takes whatever text in the clipbard, executes it as a single block in the shell

%cpaste - will prompt for which lines of pasted code to run, so we have the freedom to paste as much code as we like before executing it.

%quickref - display Ipython quick reference card

%magic - display detailed doc for all available magic commands

%hist - print command input history

%reset - delete all variables/names defined in interactive namspace

%page OBJECT - pretty-print the object and display it through a pager

%prun staetment - execute statement with cProfile and report the profiler output

%who, %who_is, %whos - display variables defined in interactive namespace, with varying levels of information/verbosity

%xdel variable delete a variable and attempt to clear any references to the object in the IPython internals

Matplotlib Integration - IPython is also good due to the nice integrations with data visualization

%matplotlib magic function configures its integration with the IPyhton shell or Jupyter notebook

Jupyter Notebook

Browser version of interactive Python interpreter.

%load - same as %run in ipython

SciPy - a collection of packegs addressing a number of different standard problem domains in scientific computing

Packages included

scipy.integrate - numeircal integration routines and differential equation solver

scipy.linalg - linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg

scipy.optimize - function optimizers (minimizers) and root finding algorithms

scipy.signal - signal processing tools

scipy.sparse - sparse matrices and sparse linear system solvers

scipy.special - wrapper around SPECFUN, a fortran library implementing many common methematical functions such as gamma function

scipy.stats - standard continuous and discrete probability distributions (density functions, samplers, continuous distribution function), stat tests

scikit-learn - premier general purpose machine learning toolkit

Classficication: SVM, nearest neighbors, random forest, logistic regression, etc

Regression: Lasso, ridge regression,etc

Clustering: k-means, spectral clustering, etc

Dimensionality reduction: PCA, feature selection, matrix factorization, etc

Model Selection: Grid search, cross-validation, metrics

Preprocessing: Feature extraction, normalization

statsmodel - statistical analyss package that was seeded by work from standford U.

Regression models: linear regression, generalized linear models, robust linear models, linear mixed effects models, etc

Analysis of variance (ANOVA)

Time series analysis

Nonparametric methods: kernel density estimation, kernel regression

visualization of statistical model results

NumPy Basics - arrays and vectorized computation, array-oriented computing

NumPy’s libary of algorithms written in C, stores data in a contiguous blocck of memory. It is good for large array. (10-100 times faster than pure python)

Functions

Function	Description
array	Convert input data (list,tuple,array,or other sequence type) to an ndarry
asarray	Convert input to ndarray, but do not copy if the input is already an ndarray
arange	like the built-in range but returns an ndarry instead of list
ones	produce an array of all 1s with given shape and dtypes
ones_like	takes another array and produces a ones array of the same shape and dtype
zeros	like ones and ones_like but producing zeores
zeros_like	same
empty	create new arrays by allocating new memory, but do not populate with any values like ones and zeros
full	produce an array of given shape and dtype with all values set to the indicated “fill value”
eye,identity	create a square NxN identity matrix (1s on the diagonal and 0s elsewhere

N-dimensional array object, or ndarray (fast, flexible container for large datasets in Python)

Example

data = numpy.random.randn(2,3)   #generates two arrays with 3 random variables
data * 10                        #each element will be multiplied by 10 quickly

Creating ndarrays

Use the array function

data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)

np.array also support multidimensional array

data1 = [[1,2,3],[4,5,6]]
arr2 = np.array(data1)

Use ndim and shape to confirm the dimension, and dtype to confirm the datatype of elements

arr2.ndim #gives 2
arr2.shape #(2,3)
arr2.dtype #dtype('float64')

np.zeros(<num_of_zeroes_in_arr>) and np.ones(<num_of_ones_in_arr>) creat arrays of 0s or 1s

np.empty creates an array without initilizing its values ot any particular type

Data Types for ndarrays

data type or dtype is a special object containing info (or metadata) the ndarry needs to interpret a chunk of memory as a particular type of data

Type	Description
int8, uint8	signed and unsigned 8-bit integer types
int16,uint16	signed and unsigned 16-bit integer types
int32,uint32	signed and unsigned 32-bit integer types
int64,uint64	signed and unsigned 64-bit integer types
float16	half-precision floating point
float32	standard single-precision floating point; compatible with C float
float64	standard double-precision floating point; compatible with C double and python float object
float128	extended-precision floating point
complex64,complex128, complex256	complex numbers represented by two 32,64 or 128 floats, respectively
bool	boolean type storing True or False values
object	python object type; a value can be any Python object
string_	Fixed-length ASCII string type (1byte)
unicode_	Fixed-length Unicode type

We can use ndarray’s astype method to explicitly convert or cast an array from one dtype to another

arr = np.array([1,2,3,4])
arr.dtype      #dtype('int64')
new_arr_in_float = arr.astype(np.float64)  #this creates a new array of type float64 and pointed by new_arr_in_float

Arithmetic with ndarrays

All +,-,*,/,** are supported on a element-to-element based.

>,<,>=,<=, == are supported too. just return a matrix of booleans

Notice python keywords and/or do not work with boolean arrays! Use & | instead!

Setting values with boolean

data[data<0] = 0 # data<0 returns an array of boolean (same dimension since original array is used), then the boolean is used to assign 0 to all elements smaller than 0

Basic Indexing and Slicing

One difference: if assigning a range to be a integer, it would broadcast to all elements in this range

arr = np.arange(10)  #[0,1,2,3,4,5,6,7,8,9]
arr[5:8] = 12        #[0,1,2,3,4,12,12,12,8,9]
new_arr = arr[5:8]   #[12,12,12]
new_arr[0] = 123456  #this would also be reflected in the original arr[5]

Indexing multidimensional array allows using single bracket with a comma separating indices

arra2d[2][0] can be replaced with arr2d[0,2]

Select/slice a range of rows or col in a high dimensional array

arr2d        #array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[:2]    #array([[1,2,3],[4,5,6]])  Select first two rows of the array
arr2d[:2,1:] #array([2,3],[5,6])  Select first two rows, in these two rows, select from second element to the end

Boolean indexing

names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data = np.random.randn(7,4) #7 rows of 4 cols of random numbers
names == 'Bob'  #array([True,False,False,True,False,False,False], dtype=bool)
data[names=='Bob']  #by passing an array of bool, we are indexing (selecting) only rows whose corresponding name is True!

Setting values with boolean

data[data<0] = 0 # data<0 returns an array of boolean (same dimension since original array is used), then the boolean is used to assign 0 to all elements smaller than 0

Fancy Indexing - using an array of integers to select what rows you want in whatever order (use negative numbers to select from the end

arr = np.empty((8,4)) #8X4 array
arr[[4,3,0,6]] #return 4th, 3rd, 0th and 6th row in a 2D array.

Fancy Indexing - using multiple index arrays to index a ndarray is a bit different - as it selects a one-dimensional array of elements to each tuple of indices

arr = np.arange(32).reshape((8,4))  #8x4 array
arr[[1,5,7,2],[0,3,1,2]]  #return an array of integers that corresponding to indices (1,0), (5,3), (7,1) and (2,2)

Fancy Indexing also does deep copy unlike slicing

Transposing Arrays and Swapping Axes

transpose is supported by calling function T or transpose method

arr = np.arange(15).reshape((3,5))
arr.T  #transposing

find inner matrix product using dot product of two matrics

arr_innerP = np.dot(arr.T,arr)

for higher dimension transpose, transpose method will take a uple of a axis numbers to permute the axes

swapaxes method takes a pair of axis numbers and switches the indicated axes to rearrange data

arr.swapaxes(1,2) swap axies 1 and 2 in an at least 3 dimensional array

Universal Functions - Fast element-Wise Array Operations

ufunc unary funcs like np.sqrt(arr), np.exp(arr) provide fast vectorized functions to produce(return) results

Function	Description
abs,fabs	Compute teh abs value element-wise for integer,floating-point, or complex values
sqrt	Compute the square root of each elelemnt
square	Compute the square of each element
exp	Compute the exponent e^x of each element
log,log10	Natural Log, log base 10,2,and log(1+x)
log2, log1p
sign	Compute teh sign of each element: 1(positive), 0(zero), -1(negative)
ceil	Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number
floor	Compute the floor of each element (i.e. the smallest integer greater than or equal ot that number
rint	Round the elements to the nearest integer, preserving the dtype
modf	Return fractional and integral parts of array as separate arrays (returns two arrays)
isnan	Return boolean array indicating whether each value is NaN (Not a Number)
isfinite,isinf	Return boolean array indicating whether each element is finite (non-inf,non_NaN) or infinite
cos,cosh,sin,	Regular and hyperbolic trigonometric functions
sinh,tan,tanh,
arccos,arccosh,
arcsin,arcsinh,
arctan,arctanh
logical_not	Compute truth value of not x elelment-wise (equivaalent to ~arr)

binary ufuncs like np.maximum(arr1,arr2) takes two arrays and computes/returns the element-wise maximum of the element

Function	Description
add	Add corresponding elements in arrays
subtract	Subtract elements in second array from first array
multiply	Multiply array elements
divide, floor_divide	Divde or floor divide
power	Raise elements in first array to powers indicated in second array
maximum, fmax	Element-wise maximum; fmax ignores NaN
minimuj, fmin	Element-wise minimum; fmin ignores NaN
mod	Element-wise modulus (remainder of division)
copysign	Copy sign of values in second argument to values in first arugments
greater, greater_equal,	Perform element-wise comparison, yielding boolean array (equivalent to infix operators >,>=,<,<=,==,!=
less, less_equal,equal,not_equal
logical_and, logical_or, logical_xor	Compute element-wise truth value of logical operation (equivalent to infix operators (&	^)

Ufuncs accept an optional “out” argument that allows them to operate in-place on arrays

np.sqrt(arr)      #this returns an element-wise sqrt of the original arr (returns a new array)
np.sqrt(arr,arr)  #second argument is out, which would store the output result

numpy.where function is like ternary expression x if condition else y for large arrays (fast, multidimention)

xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True,False,True,True,False])
result = [(x if c else y) for (x,c,y) in zip(xarr,cond,yarr)] #this could be slow and not supported for higher dimension of array
result = np.where(cond,xarr,yarr)  #this does the job nicely. 2nd and 3rd arguments don't need to be arrays. A typical use of producing an array using another array

arr = np.random.randomn(4,4) #4x4 random numbers
np.where(arr>0,2,-2)  #based on the 4x4 ranomd number 2d array, if the corresponding position is a number >0, then give a 2, else -2

Mathematical and Statistical Methods

Method	Description
sum	Sum of all elements in the array or along an axis:zero-length arrays have sum 0
mean	Arithmetic mean; zero-length arrays have NaN mean
std, var	Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator n)
min,max	Min and Max
argmin, argmax	Indices of minimujm and maximum elements, respectively
cumsum	Cumulative sum of elements starting from 0
comprod	Cumulative product of elements from 1,

Aggregation (reduced) methods like sum,mean, and std (standard deviation) either by calling the array instance method or using the top-level numPy function

arr = np.random.randn(5,4)  #generate a 5x4 array of random numbers
arr.mean() #returns a single number  - mean
np.mean(arr) #same (won't change the orignal array)
arr.sum() #returns the sum

Functions like mean,sum take an optional axis argument that computes the statistic over the given axis, resulting an array with one fewer dimension

arr.mean(axis=1)  #compute mean across columns
arr.mean(axis=0)  #compute mean across rows

Methods for boolean arrays

sum is often used as a means of counting True values in a boolean array

arr = np.random.randn(100)
(arr>0).sum() #returns like 42, which is the number of positive values

There are two additional methods: any and all, which return True or False

Sorting

NumPy arrays can be sorted in-place with sort method

arr = np.random.randn(6) #1x6 array
arr.sort()

We can sort one-dimensional section of values in a multidimensional array in-place along an axis by passing the axis number to sort

arr = np.random.randn(5,3)
arr.sort(1)  #sort each row

The top-level np.sort returns a sorted copy instead of modifying the array in-place.

A sorting example that returns 5% percentile number

arr = np.random.randn(1000)
arr.sort()
arr[int(0.05*len(arr))]

Unique and Other Set Logic

Method	Description
unique(x)	Compute the sorted, unique elements in x
intersect1d(x,y)	Compute the sorted, common elelments in x and y
union1d(x,y)	Compute the sorted, union of elements
in1d(x,y)	Compute a boolean array indicating whether each element of x in contained in y
setdiff1d(x,y)	Set difference, elements in x that are not in y
setxor1d(x,y)	Set symmetric differences; elements that are in either of the arrays, but not both

File Input and Output with Arrays

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk

arr = np.arange(10)
np.save('some_array',arr)  #will save in uncompressed raw binary format with file extension .npy (automatically appended if not specified)

arr = np.load('some_array.npy')

We save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments

np.savez('array_archive.npz',a=arr,b=arr)
arch = np.load('array_archive.npz')
arch['a']
arch['b']

We can use compressed format if data compresses well

np.savez_compressed('arrays-compressed.npz',a=arr,b=arr2)

Linear Algebra

Commonly used numpy.linalg function

Function	Description
diag	Return the diagonal (or off-diagonal) elements of a square matrix as 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal
dot	Matix dot product
trace	Compute teh sum of the diagnal elements
det	Compute the matrix determinant
eig	Compute the eigenvalues and eigenvectors of a square matrix
pinv	Compute the Moore-Penrose pseudo-inverse of a Matrix
qr	Compute the QR decomposition
svd	Compute the singular value decomposition (SVD)
solve	Solve the linear system Ax = b for x, where A is a square matrix
lstsq	Compute the least-squares solution to Ax=b

Pseudorandom Number Generation

Function	Description
seed	Seed the random number generator
permutation	Return a random permutation of a sequence, or return a permuted range
shuffle	Randomly permute a sequence in-place
rand	Draw samples from a uniform distribution
randint	Draw random integers from a given low-to-high range
randn	Draw samples from a normal distribution with mean 0 and standard deviation 1
binomial	Draw samples from a binomial distribution
normal	Draw samles from a normal distribution
beta	Draw samples from a beta distribution
chisquare	Draw samples from a chi-square distribution
gamma	Draw samples from a gamma distribution
uniform	Draw samples from a uniform [0,1) distribution

numpy.random supplements the built-in python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions

change seed by np.random.seed(N), where N is a seed number

pandas - a major tool that contians data structures and data manipulation tools to work with tabular or heterogeneous data for clean and fast data process

Series - 1D array like object containing a sequence of values with index

obj = pd.Series([4,7,-5,3])
# out:
# 0  4
# 1  7
# 2 -5
# 3 3
#dtype: int64

obj.values return an array of values, obj.index returns RangeIndex object

Series takes optional argument that allows to specify index type

obj2 = pd.Series([5,-4,2,6], index = ['d','b','a','c'])

Using NumPy funcstions, we can also manipulate pd.Series arrays

obj = pd.Series([4,7,-4,8])
np.exp(obj)
obj*2

we can create a pd.Series using a Python dict

We can override the index by passing an array as an extra argument when passing dict

states = ['ca','oh','oregon','texas']
obj = pd.Series(sdata,index=states)  #where sdata is some random data let's say
obj4
#Out:
#ca     NaN
#oh     350000
#oregon 16000
#texas  71000

pd.isnull(obj) and pd.notnull(obj) can be used to detect missing data

pd.isnull(obj)
#ca     True
#oh     False
#oregon False
#texas  False

Both the Series object itself and its index have a “name” attribute, which integreates with other key areas of pandas functionality

obj4.name = 'population'
obj4.index.name = 'state'

A Series’s index can be altered in place by obj.index = some_array

DataFrame - a rectangular table of data and contains an ordered collection of columns with different value types

like a dict of Series with the same index

data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],
        'year': [2000,2001,2002,2001,2002,2003],
	'pop':  [1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)
frame
Out:
  pop     state   year
0 1.5     Ohio    2000
1 1.7     Ohio    2001
2 3.6     Ohio    2002
3 2.4     Nevada  2001
4 2.9     Nevada  2002
5 3.2     Nevada  2003

For large tables, frame.head() will display only the first 5 rows

If you specify the sequence of columns in an array and pass as an extra argument, DataFrame’s columns will be arranged in that order

if you pass a column that isn’t contained in the dict, then that columns’ elements will be NaN

pd.DataFrame(data,columns=['year','state','pop'])

A column can be retrived as a Series either by dict-like notation or by attributes

frame['state']
frame.year

A row can be retrived through “loc” attribute

frame.loc['three']  #gives a dict indexed by original DataFrame Col names, with values of the row
#Out:
#year 2002
#state Ohio
#pop   3.6
#debt  NaN

Columns can be modified by assignment (whole column)

frame2['debt'] = 16.5  #set the entire debt col to 16.5
frame2['debt'] = np.arange(6)  #col becomes 0,1,2,3,4,5
val = pd.Series([-1.2,-1.5,-1.7],index=['two','four','five'])
frame2['debt'] = val  #this would set debt column, but only with index two, four , five and leaves other rows unchanged

Assigning a column that doesn’t exist will create a new column. The del keyword would detele columns as with a dict

frame2['eastern'] = frame2.state == 'Ohio'
#Out:
#    year    state   pop   debt   eastern
#one  2000   Ohio    1.5   NaN     True
#two  2001   Ohio    1.7   -1.2    True
#three 2002  Ohil    3.6   NaN     True
#four 2001  Nevada   2.4   1.5     False
#five 2002  Nevada   2.9   -1.7    False
#six  2003  Nevada   3.2   NaN     False

Index Objects - holding the axis labels and other metadata (like axis name)

Method	Description
append	Concatenate with additional index objects, producing a new index
difference	Compute set differnece as an index
intersection	Compute set intersection
union	Compute set union
isin	Compute boolean array indicating whether each value is contained in the passed collection
delete	Compute new index with element at index i deleted
drop	Compute new index by deleting passed values
insert	Compute new index by inserting element at index i
is_monotonic	Returns True if each element is greater than or equal to the previous element
unique	Compute the array of unique values in the index

Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an Index Object

obj = pd.Series(range(3),index=['a','b','c'])
index = obj.index
index
#Out: Index(['a','b','c'],dtype='object')

Index objects are immutable and thus can’t be mofdified by the user once created

index[1] = ‘d’ #Type Error
Immutability makes it safer to share Index objects among data structures

Essential Functionality - some key stuff with pandas data structure (i.e. Series, DataFrame)

Reindexing - rearrange the data for pandas data structures according to the new index, return a new structure

obj2 = obj.reindex(['a','b','c','d','e'])  #this returns a new Series or DataFrame with new index

pass ‘ffill’ as ‘forward-fill’, which will forward-fills values (no missing fields)

Dropping Entries from an Axis

obj = pd.Series(np.arange(5.), index=['a','b','c','d','e'])
new_obj = obj.drop('c')
new_obj = obj.drop(['d','e'])

# Out:
# a 0.0
# b 1.0

data = pd.DataFrame(np.arange(4).reshape((2,2)),index=['ohio','Colorado'],columns=['one','two'])
#  Out:
#             one   two
#  ohio        0     1
#  Colorado    2     3
data.drop('two',axis=1,inplace=True)

Tricks Learned

Remove duplicates in a list (order is not maintained by using this method): Use set

A set is an unordered collection of unique elements
L = [1,2,3,3,1,2,5] s = list(set(L))

Using Regular Expression

(?P<name>…): matched substring matched by group is accessible via the symbolic group name

(?P=quote) \1
m.group(‘quote’) m.end(‘quote’)

(?=…) matches if … matches next but doesn’t consume any of the string (lookahead assertion)

Isaac(?=Asimov) will match ‘Isaac’ only if it’s followed by “Asimov’

(?!…) matches if … doesn’t match next (negative lookahead assertion)

Isaas(?!Asimov) will match “Isaac’ only if it is not followed by ‘Asimov’

re.compile(pattern,flags=0) compiles a regular expression pattern into a re object, which can be used in match() and search() methods

prog = re.compile(pattern) #more efficient when the same pattern would be used several times result = re.match(string) OR
result = re.match(pattern,string)

re.search(pattern,string,flags=0) scan through string looking for the fisrt location where the regular expression matches and returns a match object

returns None if not found

re.match(pattern,string,flags=0) if 0 or more chars at the beginning of string match, returns the match object

only match the beginning (even in MULTILINE mode)
So if the match might be anywhere in the string, use search

re.fullmatch(pattern,string) only the whole string, return a match object. else return None

re.split(pattern,string,maxsplit=0,flags=0) splits string by RE defined in pattern

re.findall(pattern,string,flags=0) return all non-overlapping matches of pattern in a string in list

re.purge() clear regular expression cache

Match Objects

Always have a boolean value of True if there is a match (since None is returned when no match is found)

match.group([group1,…]) returns one or more subgroups of the match

m = re.match(r”(\w+)\s(\w+)”, “Isaac Newton, Physicist”) m.group(0) #”Isaac Newton” The entire match m.group(1) #”Isaac” The first parenthesized subgroup m.group(2) #”Newton” The second parenthesized subgroup
m = re.match((?P<first_name>\w+) (?P<last_name>)\w+)’,’Guanduo Li’) m.group(‘first_name’) #or m.group(1) m.group(‘last_name’) #or m.group(2) notice m.group(0) returns the entire match not parenthesized match
m.groups() returns all matches in tuple
m.groupdict returns a dictionary containing all named subgroups of matched (MUST BE NAMED)

Find if a variable is declared

Using globals()

a = 3 ‘a’ in globals() #must ’ ’

Using try/except

try: a except: print “not defined”

Deep copy a list to avoid changing the mutable in functions

L = L[:]
M.extend(L)

Convert a string to a number (in hex or binary or dec)

a = ‘0xf’ #string d = int(a,16) #to hex

Enable 3.X print function in 2.X

from __future__ import print_function

exit python script

import sys sys.exit()

Retrive command-line arguments(argv)

import sys len(sys.argv) #this is a list. sys.argv[0] stores the file name of the current running script

extract the file name from glob

os.path.basename(path)

get current dir (path)

os.getcwd()

Find whether a file is a link or dir, sort through modified time

os.path.islink #find if it's a link
os.path.isdir #find if it's a dir
files = list(filter(lambda x:os.path.isdir(x),glob.glob(path+"*") #get all dir
dirs.sort(key=lambda x:os.path.getmtime(x)) #sort the list through modified time

File test

import os.path.exists

name and “main”

__main__ is the namespace at the top. If a module is run directly (top), __name__ will be set to “__main__”, otherwise, it stores the module’s name
__name__ stores the current namespace
__name__ == “__main__” if at top

It’s convenient to have code at the bottom of a module for testing run only, not when the module is imported:

if __name__ == ‘__main__’: #testing code for current module and these code won’t be run when imported

dir() does more than iterating through dict !!

dir() knows how to grab all attributes of an object through __dict__. It also grabs all inherited attributes of this object! (all availables)
__dict__ only contains “local” sets of attributes

Use str1.find(str2) to find if str2 is a substring of str1. Return the index of first match or -1 if no match

Flattern a list of list: smart way:

a = [[1,2],[3,4],[5,5]] a_flat = sum(a,[]) #use overload of +
The second argument of sum is the initial value used as the first operand before the first + treat it as []+[1,2]+[3,4]+[5,5], which gives a flatterned list: [1,2,3,4,5,5]

Some Useful libraries

argparse #used to parse arguments (augmented/accumulated) passed to this python script. Powerful

Once all of the arguments are defined, you can parse the command line by passing a sequence of argument strings to parse_args().

By default, the arguments are taken from sys.argv[1:], but you can also pass your own list.

The options are processed using the GNU/POSIX syntax, so option and argument values can be mixed in the sequence.

To create an argparser

parser = argparse.ArgumentParser(description='Short sample app')

Use add_argument method to specify arguement

Action	Name
store	Save the value. Can optionally convert type if type is defined
store_const	Save a value defined as part of the argument specification, rather than a value that comes from the arguments being parsed
store_true	boolean True
store_false	boolean False
append	save the value to the list. Multiple values are saved if the argument is repeated
append_const	Save a value defined in the argument specification to a list
version	Prints version details about the program and then exits

Example

import argparse

parser = argparse.ArgumentParser()

parser.add_argument('-s', action='store', dest='simple_value',
                    help='Store a simple value')

parser.add_argument('-c', action='store_const', dest='constant_value',
                    const='value-to-store',
                    help='Store a constant value')

parser.add_argument('-t', action='store_true', default=False,
                    dest='boolean_switch',
                    help='Set a switch to true')
parser.add_argument('-f', action='store_false', default=False,
                    dest='boolean_switch',
                    help='Set a switch to false')

parser.add_argument('-a', action='append', dest='collection',
                    default=[],
                    help='Add repeated values to a list',
                    )

parser.add_argument('-A', action='append_const', dest='const_collection',
                    const='value-1-to-append',
                    default=[],
                    help='Add different values to list')
parser.add_argument('-B', action='append_const', dest='const_collection',
                    const='value-2-to-append',
                    help='Add different values to list')

parser.add_argument('--version', action='version', version='%(prog)s 1.0')

results = parser.parse_args()
print 'simple_value     =', results.simple_value
print 'constant_value   =', results.constant_value
print 'boolean_switch   =', results.boolean_switch
print 'collection       =', results.collection
print 'const_collection =', results.const_collection

$ python argparse_action.py -h

usage: argparse_action.py [-h] [-s SIMPLE_VALUE] [-c] [-t] [-f] [-a COLLECTION] [-A] [-B] [–version]

optional arguments: -h, –help show this help message and exit -s SIMPLE_VALUE Store a simple value -c Store a constant value -t Set a switch to true -f Set a switch to false -a COLLECTION Add repeated values to a list -A Add different values to list -B Add different values to list –version show program’s version number and exit

$ python argparse_action.py -s value

simple_value = value constant_value = None boolean_switch = False collection = [] const_collection = []

$ python argparse_action.py -c

simple_value = None constant_value = value-to-store boolean_switch = False collection = [] const_collection = []

$ python argparse_action.py -t

simple_value = None constant_value = None boolean_switch = True collection = [] const_collection = []

$ python argparse_action.py -f

simple_value = None constant_value = None boolean_switch = False collection = [] const_collection = []

$ python argparse_action.py -a one -a two -a three

simple_value = None constant_value = None boolean_switch = False collection = [‘one’, ‘two’, ‘three’] const_collection = []

$ python argparse_action.py -B -A

simple_value = None constant_value = None boolean_switch = False collection = [] const_collection = [‘value-2-to-append’, ‘value-1-to-append’]

$ python argparse_action.py –version

argparse_action.py 1.0

tabulate #used to print table in a fancy and read friendly way

shutil #used to copy/move/chagne permission of a file

marshal #used to serialize/de-serialize data to and from character strings, so they can be sent over a network. Use simple dump/load calls

defaultdict #one more feature based on regular dict: if a key doesn’t exsit, can implemnt a callback (i think)

bisect provide binary search of a list

bisect.bisect(a,val,lo=0,hi=len(a)) returns i, where a[0:i] <= val. NOTE: this is greedy! C++ lower_bound is not greedy: i.e returns first element >= val

If no match is found (a.k.a all elements are smaller than val, length of the list is returned)

a = [1,2,3,4,5]
i = bisect.bisect(a,3) #gives i == 3 -> a[0:3] = [1,2,3], but a[3] == 4
#how to understand this? same as for loop. The first element is included, but last element is not included

Use sorted with iterables and sorting function

Prototype

sorted(iterables,*,key=None,reverse=False)

key specifies a function of one argument that is used to extract a comparison key from each list element.

Example

def takeSecond(ele):
   return ele[1]
random = [(2,4),(1,5)]
#sorted will pass each item (tuple in this case) to takeSecond function, then sort by key, then sort in descending order
sorted list = sorted(random,takeSecond,reverse=True)

Note this is different from C++ std::sort as C++ use a function that return bool by telling by using operator<.

So in Python it’s even simpler since user need to extract the key and pass as key argument

Tell two variables pointing to the same object: using “is”

if a is b:

Installing or Updating Python packages

conda install package_name

pip install package_name

conda update package_name

pip install –upgrade package_name

subprocess - spawn new processes, connect to their input/output/error pipes, and obtain return code

Convenient Function - subprocess.call(args,*,stdin=None,stdout=None,shell=False)

Run the cmd, wait for it to finish, then return the returncode attribute

subprocess.call(['ls','-l'])

IdanBanani/PYTHON CHEATSHET.org

Some Info

Script typically has #! at front (usually called hash bang like what it is like in Perl)

Some Special Reserved Words

Python uses None as Null in C++

Python Object Types

Python programs can be decomposed into modules, which contain statement, which contain expressions, which create and process objects

Use dir(variable) to list all methods that could be used

Built-in Objects

Numbers

use 3.1415 * 2 would give 6.28300000004 (Full precision) but print(3.1415*2) would give an user-friendly format 6.283

Python can calculate very large numbers. No type limit! Really nice

Math module is very useful. Modules are tools to utilize

random module performs random-number generation and selections

Python also incorporates exotic numeric objections like complex, fixed-precision and rational numbers as well as set and bool

Strings

Strings are called sequences in python — a positionally ordered collection of other objects

Double quote and single quote are interchangable

Triple Quotes (block string)

A classic example of using triple quotes with formating with dictionary (dictionaries)

Formatting Method (the other flexible way to format strings besides using %)

Instead of using % to format a string, we could also use string.format method (mostly same with %)

By position

By Keyword

By both

By relative position

X = ‘{motto},{0}’.format(42,motto=3.14) #returns 3.14,42

Adding keys Attributes and Offsets

Raw Strings are used to turn off backslash converting:

String Operations

String Slice (slicing) and index (indexing)

Strings are immutable. They cannot be changed after created.

len(s) returns size of the string

use dir(S) to list variables assigned in the caller’s scope when called with no argument

help(s.replace) would give the methods help message

Pattern Matching

Use ‘in’ to find a match or substring:

Type cast or conversion

Lists

length

Initialize a list with fixed number of elements

Strings are immutable. But Lists are not. Any modifications are done in-place instead of generating a new obj

!!! Never re-assign a mutable object to itself at the same time as changing in-place methods are called

Push Back and Push Front and pop

Push front

Push Back

Pop: returns the element (by default the last one) and delete this one from list

append would append the data structure in the end of a list, extend would extend it

Replacement, Insertion and Deletion (with multiple elements)

Slice replaces the entire section all at once. But use insert, pop and remove more please.... this is strange

Replacement/Insertion

Insertion (replace nothing)

Delete

Index, slice, concat, repeat

Create a list of constant (e.g. list of zeroes)

Type-specific operations (lists are mutable)

It is not legal to index an non-existed position in a list

List Iteration

list count method

list.count(obj) #where obj is the object to be counted in the list

aList = [1,1,2,3,4,1]

Lists can be nested. A list can contain lists, dictionaries and any types

Return a copy of the list: arr[:]

List Comprehensions like map or filter built-in functions. Really powerful in constructing complex matrix

Dictionaries (dictionary)

Indexing a dict in Python is very fast searching operation (constant)

Any IMMUTABLE objects (even tuples) can be keys for dictionary. But not mutable objects like lists or other dict

Created by using { } and colon : and indexed by [ ]

It is rare to know all data in a dictionary at first. So:

use setdefault(key,default)

Use dict and () to create dictionary AND use zip to map two lists to a dictionary (one list is key, one is value)

Python allows dictionary nesting: multiple data types can co-exist as values in the same dictionary (really cool!! Much cooler than Perl)

Dictionary can’t use operator + to concatenate. Use update

Pop takes a key as argument and return and delete that value

Delete content and reclaim memory

Although not needed, we could still initialize a dict or a list

Accessing non-exsiting key is a mistake

Sorting Keys:for loops (get all keys)

For loop introduction for the first time

More Comprehension