- in UNIX like system, chmod +x myscript.py and include a #! in the file would make it executable scripting
- #!/usr/local/bin/python
- UNIX Python path look-up trick: can also use #!/usr/bin/env python to let the system finds the path for you
- Single underscore ‘_’ contains the last evaluated value
- a = 3 a _ #gives 3 notice must explicitly call a, then call _
Python programs can be decomposed into modules, which contain statement, which contain expressions, which create and process objects
- Numbers, Strings and Tuples are immutable
- Lists, dictionaries and sets are not immutable
Object Type | Example/creation |
---|---|
Numbers | 1234,3.1415,3+4j,0b111,Decimal(),Fraction() |
Strings | ‘spam’,”bos’s”,b’a\x01c’,u’sp\xc4m’ |
Lists | [1,[2,’three],4.5],list(range(10)) |
Dictionaries | {‘food’:’spam’,’taste’:’yun’},dict(hours=10) |
Tuples | (1,’spam’,4,’U’),tuple(‘spam’), namedtuple |
File | open(‘eggs.txt’), open(r’C:\ham.bin’, ‘wb’ |
Sets | set(‘abc’),{‘a’,’b’,’c’} |
Other core types | booleans, types, None |
program unit types | functions, modules, classes |
Implementation | compiled code,stack tracebacks |
use 3.1415 * 2 would give 6.28300000004 (Full precision) but print(3.1415*2) would give an user-friendly format 6.283
- for now if something looks odd, try to use print(.*) (str format)
- import math
- math.pi would give 3.141592....
- math.sqrt(85)
- import random
- random.random()
- random.choice([1,2,3,4]) //list coded in sqaure bracket
Python also incorporates exotic numeric objections like complex, fixed-precision and rational numbers as well as set and bool
- Python can also use 3rd party developed data type (including matrix and vector)
S = ‘Spam’ line = ‘aaa,bbb,ccccc,dd’ line_n = ‘aa,bb,cc\n’
Method | Example | output |
---|---|---|
find | S.find(‘pa’) | 1 |
replace | S.replace(‘pa’, ‘XYZ’) | SXYZm note that string is immutable, so we can’t S[0] = ‘p’ |
split | line.split(‘,’) | [‘aaa’,’bbb’,’ccccc’,’dd’] |
upper | S.upper() | SPAM |
isalpha | S.isalpha() | True |
isdigit | ||
rstrip | line_n.rstrip() | ‘aa,bb,cc’ (remove return char at right side) |
line_n.rstrip().split(‘,’) | ||
ord(‘\n’) | 10 | ‘\n’ is 10 in ASCII |
endswith | S.endswith(‘b’) | ends with a charactor or string: return True or False |
startswith | S.startswith(‘b’) | Starts with charactor ‘b’? return True or False |
Formating (RHS of % is tuple) | Output | |
‘%s,eggs, and %s’ % (‘spam’,’SPAM’) | spam,eggs, and SPAM! | |
’{},eggs,and {}’.format(‘spam’,’SPAM!’) | spam,eggs,and SPAM! | |
‘spam’.encode(‘utf8’) | Encoded to 4 bytes in UTF-8 in files | |
‘There are %d %s birds’ % (2,’black’) | ‘There are 2 black birds | |
Formating with Dict | Output | |
’%(qty)d more %(food)s’ % {‘qty:1,’food’:’spam’} | ‘1 more spam’ |
- begin with three quotes, followed by any number of lines o text, closed with the same triple-quote seq.
- Single or doule quotes can be embedded in the string’s text
- example mantra = “”“Always look … on the bright … side of life. “””
- Another common usage of triple quote nowadays is to temporarily disable part of code (like commenting off) X = 1 “”” import os print(os.getcwd()) “”” Y =2 #now anything between the triple quotes are disabled if rerun
- Triple Quote also allows to quote # (typically # is comment in python)
- To sum up, triple quotes are good for multiline text in my program, so it is commonly used in documentation strings
- reply = “”” Greetings… Hello %(name)s! Your age is %(age)s “”” values = {‘name’:’Bob’,’age’:40} print(reply % values)
- template = ‘{0},{1},{2}’
template.format(‘spam’,’ham’,’egg’) #return spam, ham and egg
- template = ‘{motto},{pork} and {food}’
template.format(motto=’spam’,pork=’ham’,food=’eggs’)
- template = ‘{motto},{0},{food}’
template.format(‘ham’,motto=’spam’,food=’egg’) #return spam, ham, egg
- template = ‘{},{},{}’
template.format(‘spam’,’ham’,’eggs’)
- import sys
‘My {1[kind]} runs {0.platform}’.format(sys,{‘kind’:’laptop’}) #returns My laptop runs win32 (1 is the 2nd element)
- ‘My {map[kind]} runs {sys.platform}’.format(sys=sys,map={‘kind’:laptop’})
- If we were to open a file: myfile = open(r’C:\new\text.dat’,’w’) #\n won’t be converted to newline character
- S = ‘Spam’
- len(S) gives 4
- S[0] gives S
- S[1] gives p
- S[-1] gives m
- S[-2] gives a
- S[len(S)] gives the last character m
- S[1:3] gives pa //notice this one is strange. This means gives me offset from 1 to 3 but not including 3 (1:2 indeed orz)
- S[0:3] gives Spa
- S[1:] gives ‘pam’
- S[:-1] everything but the last
- S + ‘xyz’ string concatenation
- this is actually polymorphism. An operation depends on the objects being operated on
- S * 8 gives SpamSpamSpamSpamSpamSpamSpamSpam #this is really useful print(‘--------------------------------------------------------------‘) #e.g 80 dash I needed to print print(‘-‘*80) #and it is that simple
- S[i:j:k] accepts a step (default to +1) k
- S[::2] means gets every other item from the beginning to the end (b.c first and second limits’ default are 0
- S[::-1] means to reverse the string
0 1 2 -2 -1
S | L | I | C | E | S | B | C |
[: :]
- we can’t change one specific character by using position
S = 'spam' S[0] = 'z' //error!!! //but instead we could do S = 'z' + S[1:] //hahaha smart!
- Every object in Python is classified as either immutable or not.
- double undercores are implementation of the object and are available to support cutomization (operator overloading)
- S + ‘NI!’ //this basically calls the __add__ ‘spamNI!’
- S.__add__(‘NI!’) ‘spamNI!’
- help is one of a most handful interfaces to a system of code that ships with Python known as PyDoc PyDoc is a tool that extracts doc from objects
- import re
import re
match = re.match('Hello[ \t]*(.*)world', 'Hello Python world')
match.group(1) //this gives 'Python '
import re
match = re.match('[/:](.*)[/:](.*)[/:](.*)', '/usr/home:lumberjack')
match.groups //('usr','home','lumberjack')
re.split('[/:]','/usr/home/lumberjack')
['','usr','home','lumberjack']
- ‘ab’ in ‘cabsf’ #returns True
- python does not allow + different types I = 1 G = ‘2’ G+I #error!
- use int or str int(G) + I # force addition G + str(I) # force concatenation
- we also have float…
- L = [123, ‘spam’, 1.23] len(L) //gives 3
- [for x
- [None]*8
- [[None]*3]*2 # [[None,None,None],[None,None,None]] #This won’t work as this each entry is a shallow copy
- use [ [None]*x for _ in range (3)]
Strings are immutable. But Lists are not. Any modifications are done in-place instead of generating a new obj
- S = ‘abc’ L = list(S) L #gives [‘a’,’b’,’c’] and now we can change each element S = ”.join(L) # or if numbers, use S = ”.join(str(e) for e in L)
!!! Never re-assign a mutable object to itself at the same time as changing in-place methods are called
- L = L.append(1) #We lost the reference of the list for L. Because append changes the list in place and it doesn’t return the object itself! if you assign the return value to L, then L will be pointing to None
- L = [1,2,3,4] I = 0 L = I + L # [0,1,2,3,4]
- L.insert(0,I)
- L.append(0) #append the data structure in the () #notice append is usually faster than + because it doesn’t generate new obj
- L.extend(0) #append iterables to L (if L.extend(“str”), L == [’s’,’t’,’r’] because string is iterable
But extend only works on iterable types (a list can be extended by a list, not a single integer) If L.extend(2) error! Use append for single built-in type like integer or string
- L.pop() #by default pops the last element in the end (pop back)
- L.pop(0) #pops front. We could of course use any index to pop.
- L = [1,2,3] L.append([4,5]) #[1,2,3,[4,5]]
- L.extend([4,5]) #[1,2,3,4,5]
- extend is the same as L[len(L):] = [4,5] #see next section
Slice replaces the entire section all at once. But use insert, pop and remove more please.... this is strange
- Note: A quick way to understand this: L[1:2]: 1 and 2 are delimiters. L[1:2] covers the range, which only includes ‘2’
- L = [1,2,3]
L[1:2] = [4,5] #[1,4,5,3] #replace 2 by 4,5 #if L[1] = [4,5], then [1,[4,5],3] !!!!
- Same as last. L[1:1] is delimiters. 1 to 1 doesn’t cover any range, so no elements included in the range. Thus, insert, no replace
- L = [1,2,3]
L[1:1] = [6,7] #[1,6,7,2,3]
- L = [1,2,3]
L[1:2] = [] #[1,3]
- del L[1:] # only [1] is left
- L[:-1] //slice a list returns a new list, gives [123,’spam’] notice not including -1. werid python
- L + [4,5,6] //concat
- L * 2 //repeat
- L = [‘spam’,’egg’] L.index(‘spam’) #returns 0 (the index)
- [0] * 10 # 10 zeroes in a list
L = [123,’spam’,1.23]
Operator | Example | output |
---|---|---|
append | L.append(“NI”) | [123,’spam’,1.23,’NI’] can also use + |
pop (del) | L.pop(2) | 1.23 (and it is removed from L) |
insert | L.insert | insert value at arbitrary potition |
remove | L.remove(“NI”) | pop a value by name |
extend | L.extend(1,2,3) | add multiple values at the end |
sort() | L.sort() | |
reverse | L.reverse() |
L[9999] = 1; //error: list index out of range
- for x in [1,2,3]: print(x,end=’ ‘) #1 2 3 #end= defines what to print in the end of each element print
aList.count(1) #returns 3
- M = [[1,2,3],[4,5,6],[7,8,9]] //matrix 3x3 M[1][2] //gives 6
List Comprehensions like map or filter built-in functions. Really powerful in constructing complex matrix
- col2 = [row[1] for row in M] //[2,5,8] collect items in column 2: or to say give me row 1 in each row in matrix M in a new list
- col3 = [row[1]+1 for row in M] //[3,6,9]
- col4 = [row[1] for row in M if row[1] % 2 ==0] // [2,8] filter out odd items
- col5 = [M[i][i] for i in [0,1,2]] //[1,5,9] collect diagonal from matrix
- col6 = [c*2 for c in ‘spam] // [‘ss’,’pp’,’aa’,’mm’]
- list(range(4)) //[0,1,2,3]
- list(range(-6,2,2) //[-6,-4,-2,0,2]
- [[x**2,x**3] for x in range(4)] //[[0,0],[1,1],[4,8],[9,27]]
- G = (sum(row) for row in M) //parentheses can be used to create generators that produce results on demand
- res = [c*4 for c in ‘SPAM’] #[‘SSSS’,’PPPP’,’AAAA’,’MMMM]
- list(map(abs,[0,-1,-2])) # 0,1,2 basically map takes a list and pass to the first argument, and return a list of return
Operation | Interpretation |
---|---|
D = {} | empty dict |
D = {‘cto’:{‘name’:’Bob’,’age’:40} | Nesting |
D = dict(zip(keylist,valuelist)) | zipping to form a dict from two lists |
D.keys | returns all keys |
D.values | returns all values |
D.items() | all key+value tuples |
D.copy() | copy |
D.clear() | clear |
D.update(D2) | merge from another dict by keys |
D.get(key,default?) | fetch by key if absent default (or None) |
D.pop(key,defualt?) | return and remove by key if absent default |
D.setdefault(key,default?) | fetch by key |
D.popitem() | remove and return any (key,vlaue) pair |
len(D) | how many entries |
del D[key] | delete entries by key |
- so use dict to search instead of lists like x in [1,2,3]
Any IMMUTABLE objects (even tuples) can be keys for dictionary. But not mutable objects like lists or other dict
- matrix = {} matrix[(2,3,4)] = 88 X=2;Y=3;Z=4 matrix[(X,Y,Z)] #returns 88
- D = {‘food’:’Spam’, ‘qauntity’:4, ‘color’:’pink’}
- D[‘food’] //return spam
- D[‘qauntity’]+=1;
- D // gives the entire dictionary
- D= {}
- D[‘name’] = ‘Bob’
- D[‘job’] = ‘dev’
- print(D[‘name’]) /give ‘Bob’
- if key is found, return the value of it
- if key is not found, insert the with this key with default
Use dict and () to create dictionary AND use zip to map two lists to a dictionary (one list is key, one is value)
- bob1 = dict(name=’bob’,job=’dev’,age=40) // same as {‘name’:’bob’,’job’:’dev’,’age’:40}
- bob2 = dict(zip([‘name’,’job’,’age’],[‘bob’,’dev’,40]))
- bob3 = dict([(‘name’,’Bob’),(‘age’,40)]) #dict key/value tuple form
Python allows dictionary nesting: multiple data types can co-exist as values in the same dictionary (really cool!! Much cooler than Perl)
- rec = {‘name’:{‘first’:’Bob’,’last’:’Smith’}, ‘jobs’: [dev’,’mgr’], ‘age’:40.5}
- rec[‘jobs’][-1] //give mgr
- rec[‘jobs’].append(‘janitor’)
- rec[‘name’][‘first’] //gives Bob
- rec = 0 //garbage collection would automatically deallocate this part of memory
- L = [] #initialize an empty list L[99] = ‘spam’ #index out of range error!
- L = {} #initialize an empty dict L[99] = ‘spma’ #Works!
- It is usually a programming error to fetch something that isn’t really there. But in many cases we need to test whether it is there
- The dictionary
in
membership expresssions allows us to query the existence of a key- ‘f’ in D //gives false if non-exsit
- if not ‘f’ in D:
print(‘missing’) //missing
- if not ‘f’ in D:
print(‘missing”) print(‘no,really’) //if we have multiple lines of code to be executed in a if, we simply need to indent them
- The dictionary
get
membership expressions- value = D.get(‘x’,0) //try to get. if non-exsit, assign the default value (which is 0)
- value = D[‘x’] if ‘x’ in D else 0 //same
- Use try and except
- try:
print(Matrix[(2,3,5)])
except: KeyError: print(0)
- D = {zip([‘a’,’c’,’b’],[1,2,3])}
- Ks = list(D.keys()) //Ks is [‘a’,’c’,’b’] #notice D.keys() return a view object not list. So use list(D.keys())
- Ks.sort() // ks is [‘a’,’b’,’c’]
- for key in Ks: print(key,D[key])
- for c in ‘spam’: print(c.upper()) //gives S P A M
- x =4 while x>0: print(‘spam’ * x) x -=1 //gives spamspamspamspam spamspamspam spamspam spam
- squares = [x**2 for x in [1,2,3,4,5]]
- OR
- squares = []
- for x in [1,2,3,4,5]: squares.append(x **2) //same
- [key for (key,value) in Mydict.items() if value == V]
- D = {k: v for (k,v) in zip([‘a’,’b’,’c’],[1,2,3])} #{‘b’:2,’c’:3,’a’:1}
- D = {x:x**2 for x in [1,2,3,4]} #{1:1,2:4,3:9,4:16}
- use list(D.keys) to override if seeing an error
- branch = {‘a’:1,’b’:2} print(branch.get(‘spam’,’bad choice’) #bad choice because ‘spam’ is not a index
Tuple object is a list that cannot be changed. Tuples are sequences and they are immutable like strings
- T = (1,2,3,4) //a 4 item tuple
- len(T) //gives 4
- T + (5,6) //gives 1,2,3,4,5,6
- T[0] //gives 1
- T.index(4) // gives 3. This is used to find the value 4’s index
- Can’t do T[0] = 1
Parenthesis is/are optional when creating tuples. Python treats un-parenthesized names (seperated by comma ‘,’) as tuples
- a,b #same as (a,b)
- T = ‘spam’,3.0,[11,22,33] #parenthesis is optional
- T.append(4) //Error!! immutable
Tuple Syntax peculiarities: Comma and parentheses: Must have the comma to represnet a tuple with single element
- x = (40) # THIS IS NOT A TUPLE!!! This is just integer 40. This could be changed
- x = (40,) # This is tuple. Must have the comma
- T = (3,5,1,6,2) L = list(T) L.sort() T = tuple(L)
- T.count(2) #how many 2s are there in the tuple? return the count
Tuple can’t be changed in place. But a tuple may contain mutable object as one element, which could be changed in place
- T = (1,[3,4],3) T[1][0]=2 #this works! Because a list is mutable and could be changed
- f = open(‘data.txt’,’w’) #’r’ read-only is default if no second argument passed
- f.write(“hello\n’)
- f.close()
- f = open(‘data.txt’,’r’) //Hello\nworld\n
- text = f.read() //read the entire file into a string. NOT ONE LINE!!!!!! use readline() instead
- text.split() //gives [‘hello’,’world’]
- input = open(r’C:\spam’,’r’) aString = input.readline() #read next line (including \n)
- aString = input.read(N) #read N characters
- file = open(‘test.txt’,’r’) while true: line = file.readline() if not line: break #not line means an ampty string is read (EOF) print(line.rstrip())
- output = open(r’dat.txt’,’w’) output.write(aString) #this also returns the number of character transfered from buffer to disk output.writelines(alist)
- myfile = open(r’~/data.txt’,’r’) try: for line in myfile: print(line,end=’ ‘) finally: myfile.close() #this is actually optional. Python would automatically close the file when ended. But still a good habit
- by default, output files are always buffered, meaning text we write may not be transfered from memory to disk
- use outputFile.flush() forces the buffered text to be transferred
- or when we close the file, the transfer is also done
Use For loop to iterate through lines (file datatype has an iterator built-in for accessing lines of a file)
- the file object itself is the iterator in the file
- for line in open(‘myfile.txt’): #this is like perl while(<>) lol print(line,end=’ ‘)
- f = open(’s.txt’) f.__next__()
- Python relies on int(), bin(), str(), hex(), list() to read from file and convert
- Use rstrip() a lot to get rid of \n in the end
- rstrip() by default remove \n
- if characters are passed in, remove that character
- int() and other conversion function ignores \n
- lin = open(‘d.txt’,r)
s = lin.readline() #returns 89\n i = int(s) #89
Many times we need to store some native python objects like dict or list to a file and then read back in when needed
- line = F.readline() #[1,2,3]${‘a’:1,’b’:2}\n}
parts = line.split(‘$’) #[‘[1,2,3]’,”{‘a’:1,’b’:2}\n”] eval(parts[0]) #gives [1,2,3] objects = [eval(P) for P in parts] #then objects is a list of a list and a dict
Above way is working. But sometimes using eval is dangerous because it would execute any command that string is giving
- D = {‘a’:1,’b’:2}
F = open(‘database’,’wb’) #’wb’ is needed import pickle pickle.dump(D,F) #dumpt D to file F(database) F.close()
- F = open(‘database’,’rb’)
E = pickle.load(F) #pickle automatically convert the string to a dict, and assign to E
- bob = Person(“bob smith”)
sue = Person(“sue jones”,job=’dev’,pay=1000) tom = Manager(“tom jones”,50000)
import shelve db = shelve.open(‘persondb’) for obj in (bob,sue,tom): db[obj.name] = obj db.close()
- X = set(‘spam’)
- Y = (‘h’,’a’, ‘m’)
- X,Y //A tuple of two sets : {‘m’,’a’,’p’,’s’},{‘m’,’a’,’h’} //unordered
- X & Y: {‘m’,’a’} intersection of the two sets
- X | Y //union
- X - Y //difference : {‘p’,’s’}
- X > Y //Superset : false
- X < Y //subset : true
- ‘p’ in set (‘spam’), ‘p’ in ‘spam’, ‘ham’ in [‘eggs’,’spam’,’ham’] //return: (true, true, true) #really useful in checking things like dictionary or hash. but set is really easy to use
“type” to tell whether the object is certain type (should never use!! Because this limits the type we can use in this program)
- check type
- what we care is what the object does not what it is. So this is not used almost
- type(L) // <type ‘list’>
- type(type(L)) // <class ‘type’>
- if type(L) == type([]): print(‘yes’)
- if type(L) == list: print(‘yes’)
- if isinstance(L,list): print(‘yes’)
- bool([]) #false
- bool([1]) #true
- bool({}) #false
- a = [1,2,3] b = [0,a,4] #[0,[1,2,3],4] a[0] = 0 b #[0,[0,2,3],4]
- a = [1,2,3] b = [0,a[:],4] a[0] = 0 b #[0,[1,2,3],4] #note that [:] makes the slice limits 0. The length of the seq is sliced #so basically a[:] makes a copy and return that list instead of original list
- L = [1,2] X = L * 2 #[1,2,1,2] Y = [L] * 2 #list context: [[1,2],[1,2]]
- Python print […] when it sees a cycle in object
- L = [‘g’]
L.append(L) L #[‘g’,[…]]
- Rule of thumb is to avoid this....
Literal | Interpratation |
---|---|
1234,24 | Integers (unlimited size) |
1.23, 1.3e-10, 4E210 | Floating Point numbers |
0o117, 0x9ff,0b10101 | Octal, hex and binary |
3+4j, 3.0+4.0j,3J | Complex |
set(‘spam’), {1,2,3,4} | Sets |
Decimal(‘1.0’), Fraction(1,3) | Deciam and fraction extensions |
bool(x), True,False | Boolean type and constants |
- pow, abs, round. int ,hex, bin…
- random, math…
- int(3.1415) #truncates float to integer float(3) #force to use 3.0 float type
- All Python operators may be overloaded
- b/(2.0+a) #might give 0.8000000000004
- print(b/(2.0+a)) # gives rounds off digit 0.8
- ‘%e’ % num # gives 3.333e-01 string fromatting expression
- ‘%4.2f’ % num #0.33
- ‘{0:4.2f}’.format(num) #’0.33’ string formatting method
- Python allows chain comparator: X < Y < Z #True X < Y > Z #False 1 < 2 < 3.0 < 4 #True 1 == 2 < 3 #same: 1 == 2 && 2 <3
- Floating point number chain comparison might not work as expected
1.1 + 2.2 == 3.3 #True? not exactly…
int (1.1+2.2) == 3.3 #this would work…
- X / Y # classic and true division. Alwasy keep the remainder regardless of types
- X // Y #floor division: alwasys truncates fractional remainder 10 //4 #gives 2 10 // 4.0 # gives 2.0
Math module provides floor and trunc methods (floor always counts towards a more negative number, truncate just get rid of the fraction
- import math math.floor(2.5) #gives 2 math.floor(-2.5) #gives -3 math.trunc(2.5) # gives 2 math.trunc(-2.5) #gives -2
- math.pi, math.e provides pi and other common constant math.sin(2*math.pi/180)
- math.sqrt(144)
- math.min(), math.max() #really handy min and max function. no need to self implement
- oct(64), hex(64, bin(64) can covert a number into corresponding type
- X = 99 bin(X),X.bit_length(),len(bin(X)) - 2 # (gives 0b1100011, 7, 7)
- eval(‘64’), eval(‘0o100’) # gives 64, 64
- import random random.random()
- random.randint(1,10)
- random.choice([‘life of Brain’,’Holy’, “meaning’])
- suits = [‘heart’,’clubs’,’diamond’,’spades’] random.shuffle(suits) #changes the order randomly of list suits
In python, no need to declare the type of a variable b/c types are determined automatically at runtime
BUT!! A variable never has any type information or constraints associated with it. Type always goes with objects
- a = 3 # created a variable a and created an object that stands for 3 and then link them
- the getrefcount function in the standard sys module returns the object’s reference count import sys sys.getrefcount(1) #this gives how many ref are pointing to the integer object 1 in the IDLE GUI
- L1 = [1,2,3] L2 = L1 L1[0] = 2 L2 # gives [2,2,3] because both L1 and L2 are pointing the same object. L1[0] changed the object itselt
- L1 = [1,2,3] L2 = [:] #this would make a copy. And if L1[0] = [2], this won’t change L2
- But this sliceing techniq won’t work on othre mutable types (e.g dict, sets because they are not sequences)
- To copy a dictionary or sets, we can call X.copy() method call
Or a standard library module copy also does the job
- import copy # this works for dictionary and sets
X = copy.copy(Y) Y = copy.deepcopy(Y)
- L = [1,2,3] M = L L == M #Same value (this returns true) L is M #is opeartor tells you if the two handles are pointing to the exact same object (strong, rarely used)
- L = [1,2,3] M = [1,2,3] L == M #True L is M #False because the objects are differnent in spite of the same value
- use weakref standard library module
- Weakref prevent the target object from being reclaimed.
- Useful when we are having caches of large object
Statement | Role | Example |
---|---|---|
Assignment | create reference | a,b = ‘good’,’bad’ |
if/elif/else | condition | |
for/else | iteration | |
while/else | loop | |
pass | Empty placeholder | while true: pass |
break | loop exit | |
continue | loop continue | |
def | functions and methods | def myFun(a,b,c=1,*d): |
return | function result | |
yield | Generator functions | |
global | Namespaces | |
nonlocal | Namespaces | |
try/except/finally | catching exceptions | |
raise | trigger exceptions | |
assert | debug checks | assert X>Y, ‘X too small’ |
with/as | context managers | |
del | delete references | del data[i:j] |
- but if we make the staetment multiple lines, then ( ) are required
- X = (A + B + C + D)
- if (A==1 and B==2 and C==3): print(‘spam’*3)
input (raw_input() in 2.X) built-in function takes an optional string in the argument as prompt and read from console input
- while True: reply = raw_input(‘Enter text:’) if reply == ‘stop’: break elif not reply.isdigit(): print(‘bad’*8) else: print(int(reply) **2) print(‘Bye’)
- python runs try first, then run either except part or else part (no exception triggered)
- try: num = int(reply) except: print (‘Bad’*2) else: print(num**2)
- When there are multiple items on the LHS of ‘=’, the assignments are positional
- In Python 3.X, sequence Assignments are introduced so a,b,c = “spam” is allowed. In Python 2.X, only a,b,c = “spam”,”s”,”c” is allowed
- Multiple Target assignment are not mutually connected:
- a = b= 0 b = 1 (a,b) #(0,1)
Operation | Interpretation |
---|---|
spam,ham=’ym’,’yd’ | Tuple assign |
[spam,ham]=’dy’,’dn’ | List assign |
spam = ham = ‘lunch’ | multiple target |
- print([object, …][, sep=’ ‘][, end=’\n’][, file=sys.stdout][, flush=False])
- sep is what to print between each two object in the first list argument
- print(‘spam’,’99’,’eggs’) #spam 99 eggs #sep=’ ’ space by default
- print(‘spam’,’99’,’eggs’,sep=’, ‘) #spam, 99, eggs
- print(‘spam’,file=open(‘data.txt’,’w’)) #write spam to an output file data.txt in the dir where the script is running
Print in Python 2.X (note that everying in 3.X print could be converted to Python 2.X to do the same thing)
- print x,y #note: print(x,y) would give a tuple (1,2)
- print x,y, # same as Python 3.X print(x,y,end=’ ‘)
- print x+y
- print >> log, x,y,z # == print(x,y,z,file=log)
- print ‘%s…%s’ % (x,y)
- print(x,y) #or print x,y in 2.X
- is same as import sys sys.stout.write(str(x)+’ ‘+str(y) + ‘\n)
- add to Top: from __future__ import print_function
Knowing this would allow us to redirect print to any arbitrary ways because print just called sys.stdout.write
- import sys temp = sys.stdout #important!! otherwise we can’t restore to stdout after redirection sys.stdout = open(‘log.txt’,’a’) #redirect print to a file now. It could also be a GUI window or others print(‘spam’) #now the print goes to the file instead of stdout sys.stdout.close() sys.stdout = temp #remember to restore the print to stdout
- All objects have an inherent boolean value
- Any nonzero nubmer or nonempty object is true
- Zero number, empty object and none are false
- same as if x: A = Y else: A = Z
- Notice X and Y returns Y if X is not empty
- X and Y could be any objects (they are true as long as X and Y are not empty or none)
- Useful when used as non-empty condition: Both X and Y have elements (assuming they are lists or dict), true. Or Z has element(s)
- With loop else clause, the break statement can often eliminate the need for searching status flags used in other languages
- while test: statement else #run if while didn’t exit loop with break statement
- Find if the given number is a prime number x = y //2 #// is integer division (no remainder) this is to make sure y>1 while x > 1: if y%x ==0: print (y, ‘has factor’,x) break x-=1 else: print(y, ‘is prime’)
- break #jump out of the loop instantly
- continue #go to next iteration and stop current iteration
- pass #does nothing
- loop else block #only executed if the loop was exited normally (without using break)
- for target in object: statement else: #executed if for loop didn’t hit break statement
- for x in “lumberjack”: #string
- for x in [1,2,3] #list
- for x in (1,2,3) #tuples
- for x in [[1,2],3,4,5] #x would be a list [1,2] for the first iteration
- for (a,b) in [(1,2),(3,4),(5,6)]
- D = {‘a’:1,’b’:2} #notice that when using ‘in’ on a dict, it iterates through its keys for key in D: print(key,’=>’,D[key])
- for i in range(0,10,2): #[0,2,4,6,8]
- S = ‘abcdefg’ for c in range(0,len(S),2): print(S[i],end=’ ‘) #gives a c e g
zip: takes two or more iterables and zip them together in tuples (in python 3+, returns a zip object)
- zip([1,2,4],[3,4,5],[8,9,7]) #gives [(1,3,8),(2,4,9),(4,5,7)]
map(callable,iterable,iterable…) #takes each item from all iterables and pass as one argument of the callable (python 3+ returns a map object)
- returns a list in python 2.6
- map(lambda x,y,z:max(x,y,z), [1,2],[3,4],[6,7]) #[6,7]
One case to use map as zip: map(None,[1,2],[3,4]) #will return [(1,3),(2,4)] (ZIP is a special case of map!)
- enumerate function returns a generator object
- S = ‘spam’ for (offset,item) in enumerate(S): print(item, ‘appears at offset’,offset) #s appears at offset 0 …
- Another Example elements = (‘foo’,’bar’,’baz’) for count,ele in enumerate(elements): print count,ele #give 1,foo 2,bar 3,baz
- get an iterator from an object L = [1,2,3] i = iter(L) i.__next__() #returns the first element i.__next__() #returns the second element…
- Note file object is the iterator itself. So no need to find the iterator of a file f = open(‘a.txt’,’r’) iter(f) is f #returns True f.__next__() #returns the first line of the file
- I = iter(L) #find the iterator of a list L (or this could be a dictionary) while True: try: X = I.next(I) #I.__next__() in Python 3.X except StopIteration: break print(X ** 2 , end=’ ‘)
- f = open(‘d.txt’) lines = f.readlines() #read all lines and each line is an element in list lines lines = [line.rstrip() for line in lines] #running each line and remove \n character for each line
- lines = [line.rstrip() for line in open(‘d.txt’)] #alternative and more NIUBI way
- lines = [line.upper() for line in open(‘d.txt’)]
- lines = [line.rstrip().upper() for line in open(‘d.txt’)
- lines = [line.split() for line in open(‘d.txt’)]
- lines = [line.replace(’ ‘,’!’) for line in open(‘d.txt’)] #replace space with !
- [(‘sys’ in line, line[:5]) for line in open(‘d.txt’)] #returns if each element contains ‘sys’ for 0-4
- lines = [line.rstrip() for line in open(‘d.txt’) if line[0] == ‘p’]
- [line.rstrip() for line in open(‘d.txt’) if line.rstrip()[-1].isdigit()]
- sq = {x: x*x for x in range(10)}
[x+y for x in ‘abc’ for y in ‘lmn’] #[‘al’,’am’,’an’,’bl’,’bm’,’bn’,’cl’,’cm’,’cn’] Notice y is the inner loop
- any(list) #return True if any one element is bool(x) True
- any([1,”]) #returns True
- all(list) #return True if all elements are boolean True
- all([1,”]) #return False
- max(open(‘d.txt’)) #returns the line with max string length
- returns items in an iterable for which a passed-in function returns True
list(filter(bool, [‘spam’,”,1])) #returns [‘spam’,1] and ” got filtered out since passing it to bool gives False
- returns a list of numbers > 0 only:
l = list(range(-5,5)) positive = filter((lambda x:x>0),l)
- from functools import reduce
reduce((lambda x,y:x+y),[1,2,3,4]) #10
- R = range(3) next(R) #won’t work because R is not an iterable. It is a list
- R = range(3) I1 = iter(R) next(I1)
- len(dir(sys)) #list how many attributes in sys object
- len([x for x in dir(sys) if not x[0] == ‘_’]) #how many non underscore names; use startswith or endswith for strings instead of char
- or:
len([x for x in dir(sys) if not x.startswith(‘__’)])
- __doc__ function automatically finds documentation attached to the object and run for inspecting them
- ””” Module doc Words goes here “”” class…
- import file print file.__doc__ #shows words above
- Notice we could also use “” or ’ ’ other than tripple quotes
help: extract docstrings and accociated structural information and format them into nicely arragned report
So It is legal to nest a def inside if statements (all functions are determines at runtime not compile time)
- if switch: def myFunc(): return 1 else def myFunc(): return 50 … #later on func() #depending on switch, my might define the function myFunc differently
- so we could even change the name of the function by re-assigning a name def func: return1 othername = func result = othername() #calss def func and returns 1
- def name(arg1,arg2…): statment return value #returns None if value is omitted
lambda
creates an object but returns it as a result (means: inline) (function could also be created using lambda instead of def)
yield
sends a result object back to the caller, but remembers where it left off (These functions are generators)
- This remember allows it to resume its state later, so that it could produce a series of results over time
- By default all variables are local inside a function.
- Using global allows the function to use out-of-scope variables (Python always looks up in scopes)
- allows enclosing functions to serve as a place to retain state – information remembered between function calls
return could return any object in a function. So we can return any number of objects by returning a tuple
- def multiple(x,y): x = 2 y = [3,4] return x,y #return as tuple
- a,b = multiple(1,2)
- by default, all variables declared in def are put into local scope unless specifically defined in other ways
- If we need to use a variable outside (top hierachy), use global
- If we need to assign a name that lives in an enclosing def, (Python 3.X), use nonlocal
- nonlocal has the same meaning as global. Except it is meant to reference nested def above instead of module’s variable
- In-place change to objects do not classify names as locals
- If L is declared outside and now we are inside a def L = X #this creates a new local variable L, but: L.append(X) #won’t creates a new local. In-place change. Automatically use the global scope if L is not found in local
- Name assignment creates or change local names
- Name reference searches in order: local, then enclosing functions(if any), global, then built-in (bottom up)
- X = 1
def func(Y): Z = X+Y #X is a global
- X = 1
def func(Y): global X X = 992 #X is changed
- Built-in (Python) encloses Global (Module) encloses Enclosing Function Locals encloses Local (function)
- Cross File : each module (file) is a self-contained namespace
- #fist.py
X = 99 #second.py import first #use references a name in another file print first.X
- When we need to change a global variable from another file, it’s best practice to create a function for better maintainance
#fisrt.py X = 99 def setX(new): global X X = new #second.py import first.py first.setX(30)
- sometimes we pass an list as arg but we don’t want to alter the original copy
- We can do this by copying the list: L = [1,2] changer_fun(1,L[:]) #this would create a copy
- Another way is to cast the list to a tuple, which is going to be an error if changing it: L = [1,2] changer_fun(1,tuple(L)) #like const in C
Argument Matching Syntax: must in order of: positional, followed by keyword args, then by *name form, then **name form (**name form must be at last)
- def(a,b,c): print a+b+c
- f(c=3,b=2,a=1) #lol
- Could also use mixed: f(1,c=3,b=2) #note order is important! must use positional, then keyword, then *name, then **name
- The “*” star means to pack multiple separate arguments into one tuple func([1,2,3],4,[5,6,7]) def func(*iter): print (iter) #([1,2,3],4,[5,6,7])
- star to the left of a list also unpacks the list into multiple items when passing to function a = [1,2,3] func(*a) #same as func(1,2,3)
- collects any numbers of uncollected arguments in a tuple
- def f(*eggs): print(eggs) #print all passed args
- Basically, Python collects all arguments as a tuple, and assign it to iterable(egg in above e.g.)
- works only for keyword argument
- def f(**args): print(args) f(a=0,b=1) #{‘a’:1,’b’:2}
def func(**name) - Function: Matches and collects remaining positional argumetns in a dictionary (dict)
- must follow the order of positional, named, *arg, **dict forms
- def f(a,*pargs,**kargs): print(a,pargs,kargs) f(1,2,3,x=1,y=2) #1 (2,3) {‘x’:1,’y’:2}
- unpacking: unpack a tuple def f(a,b,c,d) : print(a,b,c,d) args = (1,2) args += (3,4) f(*args) #this works!
- def saver(x=[]): #default is an empty list x.append(1)
- saver() #[1] saver() #[1,1] saver() #[1,1,1]
- def saver(x= None): if(x==None): x = [] x.append(1)
- def sum(L): if not L: return 0 else: return L[0] + sum(L[1:])
def mysun(L): return 0 if not L else L[0] + mysum(L[:])
- with def, functions must be created elsewhere of caller.
- as an expression, lambda returns a value that can optionallybe assigned a name
- f = lambda x,y,z: x+y+z f(2,3,4)
- x = (lambda a=1,b=2,c=3: a+b+c) x(2) #7
- L = [lambda x: x**2, lambda x: x**3] for f in L: print f(1) print L[0](2)
- [expression for target1 in iterable1 if condition1 for target2 in iterable2 if condition2 … for targetN in iterableN if conditionN]
- because map and comprehension use C code and for loop uses PVM bytecode
- consider using map and comprehension in loops for performance
Unlike normal def functions, generator functions suspends when a value is returned (yield). And it resumes for the next call from the last yield call
- state retains (for local variables)
Generator functions are closely bounded with iteration protocol (iterator objects define a __next__ method)
- returns next object or raise StopIteration exception to end the iteration
To end the generation of values, functions either use a return with no value OR simply allow control to fall off the end
- def gensquares(x): for i in range(x): yield pow(i,2)
- num = gensqaures(4) next(num) next(num) #use try except to iterate
- G = (c for c in ‘PAM’) list(G)
Notice that Generators (no matther func or expressions) are their own iterators: support just one active iteration
EIBTI (Explicit is better than implicit): don’t use generator in simple cases unless having a good reason
- One situation is: if we want a very long list of result. Compute them all might take long time and comsume memory
- Use generator to step through, which can reduce the memory footprint
Modules provide an easy way to organize components by serving as a new namespaces (avoid name collision among codes)
NOTE: “import” can only import modules (the file). It can’t import attributes (i.e. class,function,variables…) Use from…import for attributes
- #in a.py def func(text): print text
- #in b.py import b b.func(“a”)
import serves two purposes: 1.identify the filename 2. it also becomes a variable assigned to the loaded module
Ideally we need to: import ./b.py but Python disallows this by using a standard module search path and known file types to locate
We sometimes still need to tell Python where to look up or to find the modules (files) (python searches from 1st to last)
- Python would search this dir first. So be careful not to override the same name as other modules/std lib
- We can set PYTHONPATH env variable to a customized path and start to put our source lib there
- Python allows users to add dir to the module search path by simply listing them one per line
- All lines need to be in a text file whose name ends with a .pth suffix
- This file needs to be placed at top of Python’s installation dir
By modifying sys.path list at run-time, we can change the search path for all future imports made in a program run
- many web server program often requires this
- a usual way: sys.path.append() or sys.path.insert
At this time, Python firstly checks the timestamps (to see if bytecode is older than source code) (.pyc files are bytecode)
- if it is older, Python recompile it
- otherwise it skips the compilation
- We could ship the Python program by only shipping the .pyc bytecode without sending the source!
Only imported python file will have .pyc files generated after import. TOP LEVEL FILE does not have a .pyc file!
- because the top level file is not imported by other files
- top level file’s bytecode is generated and discarded internally
- So top level file is typically designed to be executed directly and not imported at all
If Python doesn’t have permission to create or write to .pyc file, it would just put it into memory and discard when done
- this helps reduce clutter in the source code directory
Slightly faster than normal .pyc files (but still less frequently used. PyPy system provides more substantial speedups)
All def statements in a file will be run at import time to create functions and assign attributes, so they can be called later
If the imported file has some real work (print), it will show immediately at import time. (def functions are run to create objects)
- so import basically will directly run the code in the imported file!!
import fetches entire module as a whole, while from fetches (or copies) specific names out of the module
- print (“hello”) spam = 1
- #later in top
import a #”hello”
print a.spam # 1
a.spam = 2
import a # nothing on screen because a was already imported and will stay in the memory print a.spam #2 not 1 because the module is not re-run and the assignment didn’t play in effect!
from copies specific name from one file over to another scope. So we can use that name directly without going through the module
- from module1 import pinter printer(“hello!”) #no need to add module1.printer !!!
- This requires less typing
- But be careful! This means implicitly define some new function names in current scope
- It may corrupt current namespace! Using from is disallowed if in current scope, we have the same names with the target module
- Using import is still recommanded
from module1 import * # copy out all variables into current scope… No need to call through the module
- from module1 import * printer() #no need to add module1.printer because from makes this import operation copy......
- long running applications (like server) can periodically update modules if something changed
- Note that Python can only dynamically reload modules written by Python. Not C and other languages..
- reload runs a module file’s new code and overwrites existing namespace instead of re-creating the object
- #in a.py x=1 y = [1,2]
- #in top.py from a import x,y x = 42 #change local copy only y[0] = 42 #changes shared mutable in place.
- if I were to import a package in sub-directories of the running dir, I’d need __init__.py file in sub-directories
- This __init__.py file can contain python code just like normal module files. Code inside it will be run automatically the first time imported this dir
- import dir1.dir2.mymod
- Then we need the following file structure: dir0\ dir1\ __init__.py dir2\ __init__.py mymod.py
- Once imported, the directory path becomes a handle pointing to the __init__.py object and the mod becomes the handle to the actual module dir1 #<module ‘dir1’ from ‘.\dir1\__init__.py’
- same as: import modulename name = modulename del modulename
- same as: from modulename import function1 myFunc = function1 del function1
Assignments inside the class statments make class attributes (not including the nested def in a def)
- self means this method would process its instance object
- Python defines a fixed and unchangeable mapping from each of these operations to a specially named method
- such methods are called automatically
- class myClass(firstClass): def __init__(self,value): #not to be confused with __init__.py file!!! Also, __init__ is overator overloading as well self.data = value def __add__(self,other): return myClass(self.data+other) def __str__(self): return ‘[ThirdClass:%s]’ % self.data def mul(self,other): #this mul didn’t overload the operator * self.data *=other
Attributes doesn’t have to be defined in the class to be used in an object! (quite different from other languages)
- Instances have no attributes of their own at first. They simply fetch the attribute from the class object where it is stored
- We can always assign unique attributes to an instance object
- class rec: pass
- a = rec() a.name = ‘bob’ print a.name #’bob’ ! We can always attach attributes to an instance object even these attributes are not defined in the class
- So in some sense, class would create an empty namespace, which could even be used as a dictionary
__dict__ is an built-in dictionary in an instance or class object that shows the namespace (attributes from the class not here!)
def __init__(self,name,val=0): self.name = name self.val = val def raise(val): self.val += val
def __init__(self,name,val): A.__init__(self,name,val) def raise(self,val,bonus): A.raise(self,val) #must remember to pass along the object self! self.val+=bonus
- class A: def math(self,value): self.X = value class B: def math1(self,value): self.X = value class C(A,B) #only one X can be valid!!
Pseudoprivate Attributes: Use two underscore prefix but no end with two underscores ( useful in large project. Use cautious when MI)
- Use two underscores prefix will automatically convert the name to _classname__name
- class C1: def meth1(self, value): self.__X = 88
- I = C1() I.meth1(88) print(I.__dict__) #_C1__X one underscore prefixed automatically
- class Tool: def __method(self): #becomes _Tool__method pass
When Python searches methods, it chooses the first one it encounters (lowest and leftmost in classic classes) in conflict case
- superclass.method(self) # this would break the conflict and overrides the search’s defualt
- class Person
- class Manager(Person):
- class Department: def __init__(self,*args): self.members = list(args) def addMember(self,person): self.members.append(person) def showAll(self): for person in self.members: print(person)
Classes have a __name__, just like modules, and a __bases__ sequence that provides access to superclasses
object.__dict__ attribute provides a dictionary with one key/value pair for every attribute attached to a namespace object (indluding class,obj,mod)
Method | Implements | Called for |
---|---|---|
__init__ | Constructor | X = class(args) |
__del__ | Destructor | Object reclamation of X |
__add__ | Operator+ | X+Y, X+=Y if no __iadd__ |
__or__ | Operator | (bitwise or) |
__repr__, __str__ | Printing, conversions | print(X), repr(X), str(X) |
__call__ | Function calls | X(*args,**kargs) |
__getattr__ | Attribute fetch | x.undefined |
__setattr__ | Attribute assignment | X.any = value |
__delattr__ | Attribute deletion | del X.any |
__getattribute__ | Attribute fetch | X.any |
__getitem__ | Indexing,slicing,iteration | X[key],X[i:j], for loops and other iteration if no __iter__ |
__setitem__ | Index and slice assignment | X[key] = value, X[i:j] = iterable |
__delitem__ | index and slice deletion | del X[key], del X[i:j] |
__len__ | length | len(x), truth tests if no __bool__ |
__bool__ | boolean tests | bool(x), truth tests (named __nonzero__ in 2.X) |
__lt__,__gt__ | comparisions | < > <= >= == != |
__le__,__ge__ | ||
__eq__,__ne__ | ||
__radd__ | Right-side Operators | Other + X |
__iadd__ | In-place augmented operators | X += (or else __add__) |
__iter__,__next__ | Iteration contexts | I = iter(X), next(I), for loops, in if no __contains__, all comprehensions, map(F,X), __next__ in 2.X |
__contains__ | Membership test | item in X (any iterables) |
__index__ | integer value (not index slice!) | hex(X),bin(X),oct(X) |
__enter, __exit__ | Context manager | with obj as var |
__get__, __set__,__delete__ | Descriptor attributes | X.attr, X.attr = value, del X.attr |
__new__ | Creation | Object creation, before __init__ |
__repr__ is called when the object is passed to eval() and all other context. The returned string is for developer or Python internal use to convert to an object
- When I create an object: myInst = myClass()
- myInst #will print whatever __repr__ returns
- print(myInst) # will print whatever __str__ returns if __str__ is explicited defined besides __repr__
- Notice that if we only overload __repr__ without __str__, __str__ would be the same as __repr__ unless we define one explicitly
- __str__ is for Users
- __repr__ is for developers (for Python shell)
- if __str__ is not defined, __repr__ is used in print() and str()
L[1:2] will return a slice object. So when needed, we can add a type test inside __getitem__ to determine whether index or slice
- A slice object has three attributes: start, stop, step
- When L[1:2] is called, a slice object will be passed as the second argument to __getitem__ method
- class Indexer:
data = [1,2,3,4] def __getitem__(self,index): if isinstance(index, int): #regular indexing return data[index] else: #slice return data[index.start,index.stop]
- This feature is removed in Python 3.X. So even in 2.X we should use __getitem__ and __setitem__ for both compatability
Notice that __index__ is not indexing!!!! It returns a number when hex(), bin() and oct() is in the context call
- class C: def __index__(self): return 255
- X = C() hex(X) #0xff bin(X) #0b11111111
Same thing happens with __getitem__. In for statement, it starts iterating from passing 0 to __getitem__ until hit an exception
- class StepperIndex: def __getitem__(self,i): return self.data[i]
- X = StepperIndex() X.data = “spam” for item in X: print(item,end=’ ‘)# s p a m
All iteration contexs in Python will try __iter__ method first before trying __getitem__ (prefer __iter__)
- Typically __next__ method needs to be overloaded alone with __iter__ if the same class is returned as iterator
- class Squares: def __init__(self,start,stop): self.value = start- 1 self.stop = stop def __iter__(self): #return self because the __next__method is part of this class itself. In more complex scenarios, it may return other class return self def __next__(self): if(self.value == self.stop): raise StopIteration self.value += 1 return pow(self.value,2)
- for x in Squares(1,5): print x # 1 4 9 16 25
- class SkipObject:
def __init__(self,wrapped): self.wrapped = wrapped def __iter__(self): return SkipIterator(self.wrapped)
- class SkipIterator:
def __init__(self,wrapped): self.wrapped = wrapped self.offset = 0 def __next__(self): if self.offset >= len(self.wrapped): raise StopIteration else: item = self.wrapped[self.offset] self.offset+=2 return item
- no need to overload __next__
- class Squares: def __init__(self,start,stop): self.start = start self.stop = stop def __iter__(self): for value in range(self.start,self.stop+1): yield pow(value,2)
Membership: __contains__, __iter__, and __getitem__ (__contain__ is prefered over __iter__, which is prefered over __getitem__ in “in” context)
- class Empty: def __getattr__(self, attrname): if attrname == ‘age’: return 40 else: raise AttributeError(attrname)
- X = Empty() X.age #40 X.name #AttributeError: name
__setattr__ is called if __setattr__ is defined and when a new attibute is assigned outside of class (BUT! WATCH FOR INFINITE LOOP!)
- class Control:
def __setattr__(self,attr,value): if attr == ‘age’: self.__dict__[attr] = value + 10 else: raise AttributeError(attr+” not allowed”)
- if not use __dict__ to set the new attribute, an infinite recursive call will happen because when X.age = 40 is called, python
calls Control’s __setattr__ and in the __setattr__, it calls X.age again, which calls __setattr__ again.....
- is called if del object.attr is called outside the class
- class adder:
def __add__(self,val):
return self.val+val
def __radd__(self,val):
self.__add__(self,val)
#or: __radd__ = __add__ #cut of the middle man
- class adder: def __iadd__(self,val): self.val += val return self
- class Callee: def __call__(self,*arg,**argv): print(“called”,arg,argv
- c = Callee() c(1,2,3,x=2,y=7) #called (1,2,3)(“x”:2,”y”:7) #notice function arg transfer an iterable into a tuple and pass to *arg
- class Acallee: def __call__(self,*parag,sep=0,**argv): #3.x keyword only!! pass #do something here like print in python 3.0
In bool context, Python firstly tries __bool__, if not defined, Python then tries __len__ (Python 3.X)
In Python 2.X, it uese __nonzero__ instead of __bool__. 3.X simply renamed __nonzero__ to __bool__. __len__ is still the fallback
Bound or Unbound (first glance at static methods in a class): Python 2.X doesn’t support unbound call
- class Selfless: def selfless(arg1,arg2): return arg1+arg2
- X = Selfless() Selfless.selfless(1,2) #works in 3.X ONLY! Failed at 2.X
Python automatically pair the instance object to the first argument of the bound method under class instance
- def factory(aClass,*pargs,**kargs): return aClass(*parags,**kargs)
- object1 = factory(Person, “Arthur”,”King”) object3 = factory(Person,name=’Brain’)
- class A: pass class B: pass
- a = A() b= B() type(a) == type(b) # True in 2.X and false in 3.X: In 3.X, Python actually compare the class difference
- In python 2.X: needs to do type(a.__class__) == type(b.__class__)
All new-style classes inherit from object in Python 2.X. In 3.X, this is added automatically above the user-defined root
__getattribute__ is not the same as __getattr__, as it is called for every attribute call (watch for infinite loop when modifying this)
- Python reserves just enough space in each instance to hold a vlaue for each slot attribute - save memory!
- Best to use for rare cases where large numberes of instances in a meory-critical application
- class limiter(object): __slot__ = [‘age’,’name’,’job’] #only names in __slots__ list can be assigned as intance attributes
- Slots in subs are pointless when absent in supers
- Slots in supers are pointless when absent in subs
- Slots typically needs to include __dict__
- instance method: pass a self
- static method: no instance passed
- class method: gets a class, not instance
- class Methods:
def imeth(self,x): #instance method
pass
def smethod(x): #static: no instance
print([x])
def cmethod(cls,x): #class: get class, passing the class to the first argumetn is automatically done!
print([clk,x])
smethod = staticmethod(smethod) #make smethod a static method cmethod = classmethod(cmethod) #make cmethod a class method
Fetching a method from a class produces an unbound method, which cannot be called without manually passing an instance
We must ALWAYS declare a method as static in order to call it without an instance, whether it is through a class or instance
Fetching a method from a class produces a simple function,which can be called normally with no instance present
We need NOT declare such methods as static if they will be called through a class only, But we MUST do so in order to call them through an instance
Calling super() in a method of a subclass will inspect the call stack in order to automatically locate self argument and find the superclass
Then it pairs the two in a special proxy object that routes the later call to the superclass version of the method. So no need to pass self in super
e.g. self.__class__.__bases__[0] # this is a violation of undamental Python idiom for a single use case!
- class C(A,B): def act(self):
We could use super() to call super class’s __x__ methods. But note that direct operators do not work
- class D(C): def __getitem__(self,ix): C.__getitem__(self.ix) #This works super().__getitem__(ix) #This works too and no need self because Python automatically checks for it super()[ix] #THIS WON”T WORK!
- class D(C): def act(self): super(D,self).act() #Too complex to use… But this is compatable in 3.X
Superclass that might be changed at runtime dynamically preclude hardcoding their names in a subclass’s method
- class X:
def m(self): print(‘X.m’) class Y: def m(self): print(‘Y.m’) class C(X): def m(self): super().m()
- i = C()
i.m() #call X’s m method C.__bases__ = (Y,) #changing superclass at runtime!! i.m() #call Y’s m method
property allows us to route a specific attribute’s get,set and delete operations to functions or methods we provide
- class Person: def __init__(self, name): self._name = name def getName(self): print (‘fetch…’) return self._name def setName(self,value): print(“change…’) self._name = value def delName(self): print (‘remove…’) del self._name name = property (getName,setName,delName,”name property docs”)
Clause Form | Interpretation |
---|---|
except: | catch all (or all other) exceptions types |
except name: | catch a specific exception type only |
except name as value: | catch a specific excpetion and assign its instance |
except (name1,name2): | catch any listed exception types |
except (name1,name2) as value | catch any listed exception types and assign its instance |
else: | Run if no exceptions are raised in the try block |
finally: | always perfrom this block on exit |
- try: 1/0 except Exception as X: print X #ZeroDivisionError(‘integer division or modulo by zero’)
- try: #do something except SomeExcpt: #do something except: #BAD
except all other might catch genuine programming mistakes for which we want to see as an error message
Even ctrl+c will trigger exceptions. We probably don’t want to catch that, which could make the program unstoppable
- try: action() except Exception: #catch all exceptions, except exits other_action()
- class AlreadyGotOne(Exception): pass
def grail(): raise AlreadyGotOne()
try: grail() except AlreadyGotOne as X: print(‘got exception: AlreadyGotOne’) print(‘caught : %s ’ % X.__class__) else: print “Good without any exceptions!”
This could be used to ensure that server shutdown code is run when an exception occurs and program can exit safely with server shut down
- with open(‘lumberjack.txt’,’w’) as file: #always close file on exit file.write(“the larch”)
try: raise IndexError(“spam”) except IndexError: print (‘propagating’) raise #raise most recent exception
- try: 1/0 except Exception as E: raise TypeError(‘Bad’) from E
if __debug__: if not test: raise AssertionError(data)
- class NumErr(Exception): pass
class DivZero(NumErr): pass
class Oflow(NumErr): pass
def func(): … raise DivZero() #note: needs to create an instance object! don’t just raise the class
Function decorators - specifies special operation modes by wrapping functions in an extra layer of logic implemented as another function (metafunction)
- @staticmethod def meth(): pass
- def meth(): pass meth = staticmethod(meth)
def my_decorator(some_func):
def wrapper(): #wrapper now replaces some_fun function. calling this one indeed
num = 10
if num == 10:
#do something
some_func()
print("callback can be done here too")
return wrapper #note here we need to return the function instead of calling it!
@my_decorator
def some_fun():
pass
some_fun() #this is equivalent to calling wrapper() above
- class Tracer:
def __init__(self,func):
self.calls =0
self.func = func
def __call__(self,*args):
self.calls += 1
print (‘call %s to %s’ % (self.calls, self.func.__name__))
return self.func(*args)
@Tracer def spac(a,b,c): return a+b+c #spac = Tracer(spac) : #call 1 to spac\n 6
print(spac(1,2,3)) #since spac now is an instance object of Tracer, spac(1,2,3) would invoke __call__
Class decorators - adds support for management for whole objects and their interfaces (often called metaclasses)
When users pass arguments to the decorator (e.g @decorator_with_arg(“hello”,1,2)), the function to be decorated is not passed to the constructor!
class decorator_with_arguments(object):
def __init__(self,arg1,arg2,arg3):
print("inside __init__") # this happens during decorating process
self.arg1 = arg1 #this is from the decorator argument
self.arg2 = arg2 #this is from the decorator argument
self.arg3 = arg3 #this is from the decorator argument
def __call__(self,f):
print("inside __call__") #this happens during decorating process
def wrapper(*arg):
print("inside wrap") #this happens after decorating process and when the target function is called! These are from the real argument
print("decorator arguments:",self.arg1,self.arg2,self.arg3)
f(*args)
return wrapper
@decorator_with_argumetns("hello","world",32)
def hello(a1,a2,a3,a4):
print("sayHello argument:",a1,a2,a3,a4)
print("after decoration)
hello("say","hello","argument","list")
- def count(aClass):
aClass.numInstances = 0
return aClass
@count class Spam: …
@count class Sub: …
@count class Other(Sub) …
- with expression [as variable]: with-block
- expression here is assumed to return an object that supports the context management protocol
- This object may also return a value that will be assigned to the name variable if the optional as clause is present
__enter__ is called and the value it returns is assigned to the variable in the as clause if present, or discarded otherwise
If an exception is raised in with block, __exit__(type,value,traceback) method is called with the exception details
- (type,value,traceback) are the same three values returned by sys.exc_info
- with open(‘data’) as fin, open(‘res’,’w’) as fout: for line in fin: if ‘some key’ in line: fout.write(line)
- S = ‘ni’ S.encode(‘utf16’),len(S.encode(‘utf16’) #(b’\xff\xfen\x00i\x00’,6)
- @decorator
def F(arg):
…
#same as F = decorator(F)(arg)
- class decorator:
def __init__(self,func):
self.func = func
def __call__(self,*args):
#use self.func and args
@decorator def func(arg): #do something
- def decorator(cls):
class Wrapper:
def __init__(self,*args):
self.wrapped = cls(*args)
def __getattr__(self,name): #this basically intecepts any attributes get operations!
return getattr(self.wrapped,name)
return Wrapper
@decorator class C: def __init__(self,x,y): self.attr = ‘spam’
x = C(6,7) #really calls Wrapper(6,7) print(x.attr) #runs Wrapper.__getattr__, prints ‘spam’
- @A @B @C def f(…):
- def (f…): #same as f = A(B(C(f)))
- code #this case no need to return callables since __call__ intercepts calls and call it for you
class tracer:
def __init__(self,func):
self.calls = 0
self.func = func
def __call__(self,*args):
self.calls += 1
print(‘call %s to %s’ % (self.calls,self.func.__name__))
@tracer def spam(a,b,c): print(a+b+c)
Use @property to convert a method to a property. @property decorate will create a new property @score.setter
class Student(object):
@property def score(self): return self._score
@score.setter def score(self,value): if not isinstance(value,int): raise ValueError(“score must be an integer”) if value <0 or value >100: rasie ValueError(“score must between 0-100”) self.score = value
proc = subprocess.Popen(["cat",'/tmp/bax"],stdout=subprocess.pipe)
(out,err) = proc.communicate()
itertools - module that implements a number of iterator building blocks to improve memory efficiency and speedup execution time
In python2, functions like zip, map returns list. We must use itertool to return iterators on those. In python3, by default zip/map return iterators (using itertools!)
make an iterator that returns accumulated sums or results of other binary functions (in func if specified)
if func is supplied, it should be a function of two arguments. elements of the input iterable maybe any type
def accumulate(iterable,func=operator.add):
it = iter(iterable)
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total,element)
yield total
make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all are exhausted
So only works when multiple arguments are there and they are iterable - itertools.chain(‘abc’,’def’) - > a,b,c,d,e,f
itertools.combinations(iterable,r) - return r length subsequences of elements from the input iterable
return r length subsequences of elements from the input iterable allowing individual elemnts to be repeated more than once
Make an iterator that filters elements from data returning only those that have acorresponding element in slectors that evaluates to True.
def compress(data,selectors):
#compress('ABCDEF',[1,0,1,0,1,1]) -> A C E F
return (d for d,s in zip(data,selectors) if s)
Note: the iterator does not produce any output until predicate first becomes false. So this might have a lengthy start-up time
Make an iterator that filters elemtns from iterable returning only those for which the predicate is false.
Make an iterator that retuns consecutive keys and groups from the iterable. The key is a function computing a key value for each element
if not specified or is None, key defaults to an identity function and returns the element unchanged.
return a “list(in fact a groupby object)” of tuples. each tuple’s first element is the unique char and the second element is the iterator of exact number of char
>>> a = it.groupby(‘AAAABBBCCD’) >>> list(a) [(‘A’, <itertools._grouper object at 0x10b5a7b70>), (‘B’, <itertools._grouper object at 0x10b5a7ba8>), (‘C’, <itertools._grouper object at 0x10b5a7be0>), (‘D’, <itertools._grouper object at 0x10b5a7c18>)]
Make an iterator that retuns object over and over again (run indefinitely unless the times argument is specified.
Make an iterator that aggregates elements from each of the iterables. if the iterables are of uneven length, missing values are filled with fillvalue
collections - module that implements high-perf containers alternatives to Python built-in dict,list,set,tuple
The new subclass is used to create tuple-like objects that have fields accessible by attribute lookup AND being indexable and iterable
Named tuples are especially useful for assigning field names to result tuples returned by csv or sqlite3 modules
Methods | Description |
---|---|
append(x) | add x to the right side |
appendleft(x) | add x to the left side |
clear() | remove all elements |
count(x) | count number of deque |
extend | extend the right side by appending from iterable |
extend_left(x) | extend the left side by appending from iterable |
pop() | remove and return from right |
popleft() | remove and return from left |
remove(value) | remove the first occurance of value. if not found, raise ValueError exception |
reverse() | reverse the elements of hte deque in-place and return None |
rotate(n=1) | rotate the deque n setps to the right. if n is negative, to the left |
cnt = Counter()
for word in ['red','blue','red','green','blue','blue']:
cnt[word]+=1
cnt # Counter({'blue':3,'red':,'green':1})
Methods | Description | |
---|---|---|
elements() | return an iterator over elements repeating each as many times as its count | #[‘a’,’a’,’a’,’b’,’c’,’c’] if Counter(a=3,b=2,c=2,d=0) |
most_common() | return a list of n most common elements and their counts | |
subtract([iterable-or-mapping]) | elements are subtracted from an iterable or from another mapping | |
update([iterable-or-mapping]) | elemetns are counted from an iterable or added-in from another mapping |
OrderedDict.popitem(last=True) - new method for ordered dict, returning and remove the specified key
if an entry is not created yet, the default_factory will create a data type automatically ready to be used directly
s = [('yellow',1),('blue',2),('yellow',3),('blue',4),('red',1)] #we want each element of the dict to be a list of all numbers shown indexed by color
d = defaultdict(list)
for k,v in s:
d[k].append(v)
d.items() #[('blue',[2,4]),('red',[1]),('yellow',[1,3])
* Python for Data Analysis - at page 164
** ipython - an enhanced Python Interpreter, which allows to explore the results interactively when a script is done executing
*** Tab completion (objects, functions, etc)
*** Introspection - use question mark (?) after a variable will display some general information about the object (will even show docstring for functions)
b?
#Type: list
Use %run -i instread of plain python will give a script access to variables already defined in the interactive IPython namespace
In [20]: a = np.random.randn(100,100)
In [20]: %timeit np.dot(a.a)
100000 loops, best of 3: 20.9 us per loop
mode1 - activate debugger before executing code. This way, we can set a breakpoint to step through code from the point
- if an exception occurs, this lets you inspect its stack frames interactively
%cpaste - will prompt for which lines of pasted code to run, so we have the freedom to paste as much code as we like before executing it.
%who, %who_is, %whos - display variables defined in interactive namespace, with varying levels of information/verbosity
%xdel variable delete a variable and attempt to clear any references to the object in the IPython internals
SciPy - a collection of packegs addressing a number of different standard problem domains in scientific computing
scipy.linalg - linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg
scipy.special - wrapper around SPECFUN, a fortran library implementing many common methematical functions such as gamma function
scipy.stats - standard continuous and discrete probability distributions (density functions, samplers, continuous distribution function), stat tests
Regression models: linear regression, generalized linear models, robust linear models, linear mixed effects models, etc
NumPy’s libary of algorithms written in C, stores data in a contiguous blocck of memory. It is good for large array. (10-100 times faster than pure python)
Function | Description |
---|---|
array | Convert input data (list,tuple,array,or other sequence type) to an ndarry |
asarray | Convert input to ndarray, but do not copy if the input is already an ndarray |
arange | like the built-in range but returns an ndarry instead of list |
ones | produce an array of all 1s with given shape and dtypes |
ones_like | takes another array and produces a ones array of the same shape and dtype |
zeros | like ones and ones_like but producing zeores |
zeros_like | same |
empty | create new arrays by allocating new memory, but do not populate with any values like ones and zeros |
full | produce an array of given shape and dtype with all values set to the indicated “fill value” |
eye,identity | create a square NxN identity matrix (1s on the diagonal and 0s elsewhere |
data = numpy.random.randn(2,3) #generates two arrays with 3 random variables
data * 10 #each element will be multiplied by 10 quickly
data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)
data1 = [[1,2,3],[4,5,6]]
arr2 = np.array(data1)
arr2.ndim #gives 2
arr2.shape #(2,3)
arr2.dtype #dtype('float64')
data type or dtype is a special object containing info (or metadata) the ndarry needs to interpret a chunk of memory as a particular type of data
Type | Description |
---|---|
int8, uint8 | signed and unsigned 8-bit integer types |
int16,uint16 | signed and unsigned 16-bit integer types |
int32,uint32 | signed and unsigned 32-bit integer types |
int64,uint64 | signed and unsigned 64-bit integer types |
float16 | half-precision floating point |
float32 | standard single-precision floating point; compatible with C float |
float64 | standard double-precision floating point; compatible with C double and python float object |
float128 | extended-precision floating point |
complex64,complex128, complex256 | complex numbers represented by two 32,64 or 128 floats, respectively |
bool | boolean type storing True or False values |
object | python object type; a value can be any Python object |
string_ | Fixed-length ASCII string type (1byte) |
unicode_ | Fixed-length Unicode type |
arr = np.array([1,2,3,4])
arr.dtype #dtype('int64')
new_arr_in_float = arr.astype(np.float64) #this creates a new array of type float64 and pointed by new_arr_in_float
data[data<0] = 0 # data<0 returns an array of boolean (same dimension since original array is used), then the boolean is used to assign 0 to all elements smaller than 0
One difference: if assigning a range to be a integer, it would broadcast to all elements in this range
arr = np.arange(10) #[0,1,2,3,4,5,6,7,8,9]
arr[5:8] = 12 #[0,1,2,3,4,12,12,12,8,9]
new_arr = arr[5:8] #[12,12,12]
new_arr[0] = 123456 #this would also be reflected in the original arr[5]
- arra2d[2][0] can be replaced with arr2d[0,2]
arr2d #array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[:2] #array([[1,2,3],[4,5,6]]) Select first two rows of the array
arr2d[:2,1:] #array([2,3],[5,6]) Select first two rows, in these two rows, select from second element to the end
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data = np.random.randn(7,4) #7 rows of 4 cols of random numbers
names == 'Bob' #array([True,False,False,True,False,False,False], dtype=bool)
data[names=='Bob'] #by passing an array of bool, we are indexing (selecting) only rows whose corresponding name is True!
data[data<0] = 0 # data<0 returns an array of boolean (same dimension since original array is used), then the boolean is used to assign 0 to all elements smaller than 0
Fancy Indexing - using an array of integers to select what rows you want in whatever order (use negative numbers to select from the end
arr = np.empty((8,4)) #8X4 array
arr[[4,3,0,6]] #return 4th, 3rd, 0th and 6th row in a 2D array.
Fancy Indexing - using multiple index arrays to index a ndarray is a bit different - as it selects a one-dimensional array of elements to each tuple of indices
arr = np.arange(32).reshape((8,4)) #8x4 array
arr[[1,5,7,2],[0,3,1,2]] #return an array of integers that corresponding to indices (1,0), (5,3), (7,1) and (2,2)
arr = np.arange(15).reshape((3,5))
arr.T #transposing
arr_innerP = np.dot(arr.T,arr)
for higher dimension transpose, transpose method will take a uple of a axis numbers to permute the axes
arr.swapaxes(1,2) swap axies 1 and 2 in an at least 3 dimensional array
ufunc unary funcs like np.sqrt(arr), np.exp(arr) provide fast vectorized functions to produce(return) results
Function | Description |
---|---|
abs,fabs | Compute teh abs value element-wise for integer,floating-point, or complex values |
sqrt | Compute the square root of each elelemnt |
square | Compute the square of each element |
exp | Compute the exponent e^x of each element |
log,log10 | Natural Log, log base 10,2,and log(1+x) |
log2, log1p | |
sign | Compute teh sign of each element: 1(positive), 0(zero), -1(negative) |
ceil | Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number |
floor | Compute the floor of each element (i.e. the smallest integer greater than or equal ot that number |
rint | Round the elements to the nearest integer, preserving the dtype |
modf | Return fractional and integral parts of array as separate arrays (returns two arrays) |
isnan | Return boolean array indicating whether each value is NaN (Not a Number) |
isfinite,isinf | Return boolean array indicating whether each element is finite (non-inf,non_NaN) or infinite |
cos,cosh,sin, | Regular and hyperbolic trigonometric functions |
sinh,tan,tanh, | |
arccos,arccosh, | |
arcsin,arcsinh, | |
arctan,arctanh | |
logical_not | Compute truth value of not x elelment-wise (equivaalent to ~arr) |
binary ufuncs like np.maximum(arr1,arr2) takes two arrays and computes/returns the element-wise maximum of the element
Function | Description | |
---|---|---|
add | Add corresponding elements in arrays | |
subtract | Subtract elements in second array from first array | |
multiply | Multiply array elements | |
divide, floor_divide | Divde or floor divide | |
power | Raise elements in first array to powers indicated in second array | |
maximum, fmax | Element-wise maximum; fmax ignores NaN | |
minimuj, fmin | Element-wise minimum; fmin ignores NaN | |
mod | Element-wise modulus (remainder of division) | |
copysign | Copy sign of values in second argument to values in first arugments | |
greater, greater_equal, | Perform element-wise comparison, yielding boolean array (equivalent to infix operators >,>=,<,<=,==,!= | |
less, less_equal,equal,not_equal | ||
logical_and, logical_or, logical_xor | Compute element-wise truth value of logical operation (equivalent to infix operators (& | ^) |
np.sqrt(arr) #this returns an element-wise sqrt of the original arr (returns a new array)
np.sqrt(arr,arr) #second argument is out, which would store the output result
numpy.where function is like ternary expression x if condition else y for large arrays (fast, multidimention)
xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True,False,True,True,False])
result = [(x if c else y) for (x,c,y) in zip(xarr,cond,yarr)] #this could be slow and not supported for higher dimension of array
result = np.where(cond,xarr,yarr) #this does the job nicely. 2nd and 3rd arguments don't need to be arrays. A typical use of producing an array using another array
arr = np.random.randomn(4,4) #4x4 random numbers
np.where(arr>0,2,-2) #based on the 4x4 ranomd number 2d array, if the corresponding position is a number >0, then give a 2, else -2
Method | Description |
---|---|
sum | Sum of all elements in the array or along an axis:zero-length arrays have sum 0 |
mean | Arithmetic mean; zero-length arrays have NaN mean |
std, var | Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator n) |
min,max | Min and Max |
argmin, argmax | Indices of minimujm and maximum elements, respectively |
cumsum | Cumulative sum of elements starting from 0 |
comprod | Cumulative product of elements from 1, |
Aggregation (reduced) methods like sum,mean, and std (standard deviation) either by calling the array instance method or using the top-level numPy function
arr = np.random.randn(5,4) #generate a 5x4 array of random numbers
arr.mean() #returns a single number - mean
np.mean(arr) #same (won't change the orignal array)
arr.sum() #returns the sum
Functions like mean,sum take an optional axis argument that computes the statistic over the given axis, resulting an array with one fewer dimension
arr.mean(axis=1) #compute mean across columns
arr.mean(axis=0) #compute mean across rows
arr = np.random.randn(100)
(arr>0).sum() #returns like 42, which is the number of positive values
arr = np.random.randn(6) #1x6 array
arr.sort()
We can sort one-dimensional section of values in a multidimensional array in-place along an axis by passing the axis number to sort
arr = np.random.randn(5,3)
arr.sort(1) #sort each row
arr = np.random.randn(1000)
arr.sort()
arr[int(0.05*len(arr))]
Method | Description |
---|---|
unique(x) | Compute the sorted, unique elements in x |
intersect1d(x,y) | Compute the sorted, common elelments in x and y |
union1d(x,y) | Compute the sorted, union of elements |
in1d(x,y) | Compute a boolean array indicating whether each element of x in contained in y |
setdiff1d(x,y) | Set difference, elements in x that are not in y |
setxor1d(x,y) | Set symmetric differences; elements that are in either of the arrays, but not both |
np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk
arr = np.arange(10)
np.save('some_array',arr) #will save in uncompressed raw binary format with file extension .npy (automatically appended if not specified)
arr = np.load('some_array.npy')
We save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments
np.savez('array_archive.npz',a=arr,b=arr)
arch = np.load('array_archive.npz')
arch['a']
arch['b']
np.savez_compressed('arrays-compressed.npz',a=arr,b=arr2)
Function | Description |
---|---|
diag | Return the diagonal (or off-diagonal) elements of a square matrix as 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal |
dot | Matix dot product |
trace | Compute teh sum of the diagnal elements |
det | Compute the matrix determinant |
eig | Compute the eigenvalues and eigenvectors of a square matrix |
pinv | Compute the Moore-Penrose pseudo-inverse of a Matrix |
qr | Compute the QR decomposition |
svd | Compute the singular value decomposition (SVD) |
solve | Solve the linear system Ax = b for x, where A is a square matrix |
lstsq | Compute the least-squares solution to Ax=b |
Function | Description |
---|---|
seed | Seed the random number generator |
permutation | Return a random permutation of a sequence, or return a permuted range |
shuffle | Randomly permute a sequence in-place |
rand | Draw samples from a uniform distribution |
randint | Draw random integers from a given low-to-high range |
randn | Draw samples from a normal distribution with mean 0 and standard deviation 1 |
binomial | Draw samples from a binomial distribution |
normal | Draw samles from a normal distribution |
beta | Draw samples from a beta distribution |
chisquare | Draw samples from a chi-square distribution |
gamma | Draw samples from a gamma distribution |
uniform | Draw samples from a uniform [0,1) distribution |
numpy.random supplements the built-in python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions
pandas - a major tool that contians data structures and data manipulation tools to work with tabular or heterogeneous data for clean and fast data process
obj = pd.Series([4,7,-5,3])
# out:
# 0 4
# 1 7
# 2 -5
# 3 3
#dtype: int64
obj2 = pd.Series([5,-4,2,6], index = ['d','b','a','c'])
obj = pd.Series([4,7,-4,8])
np.exp(obj)
obj*2
states = ['ca','oh','oregon','texas']
obj = pd.Series(sdata,index=states) #where sdata is some random data let's say
obj4
#Out:
#ca NaN
#oh 350000
#oregon 16000
#texas 71000
pd.isnull(obj)
#ca True
#oh False
#oregon False
#texas False
Both the Series object itself and its index have a “name” attribute, which integreates with other key areas of pandas functionality
obj4.name = 'population'
obj4.index.name = 'state'
DataFrame - a rectangular table of data and contains an ordered collection of columns with different value types
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],
'year': [2000,2001,2002,2001,2002,2003],
'pop': [1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)
frame
Out:
pop state year
0 1.5 Ohio 2000
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
5 3.2 Nevada 2003
If you specify the sequence of columns in an array and pass as an extra argument, DataFrame’s columns will be arranged in that order
- if you pass a column that isn’t contained in the dict, then that columns’ elements will be NaN
pd.DataFrame(data,columns=['year','state','pop'])
frame['state']
frame.year
frame.loc['three'] #gives a dict indexed by original DataFrame Col names, with values of the row
#Out:
#year 2002
#state Ohio
#pop 3.6
#debt NaN
frame2['debt'] = 16.5 #set the entire debt col to 16.5
frame2['debt'] = np.arange(6) #col becomes 0,1,2,3,4,5
val = pd.Series([-1.2,-1.5,-1.7],index=['two','four','five'])
frame2['debt'] = val #this would set debt column, but only with index two, four , five and leaves other rows unchanged
Assigning a column that doesn’t exist will create a new column. The del keyword would detele columns as with a dict
frame2['eastern'] = frame2.state == 'Ohio'
#Out:
# year state pop debt eastern
#one 2000 Ohio 1.5 NaN True
#two 2001 Ohio 1.7 -1.2 True
#three 2002 Ohil 3.6 NaN True
#four 2001 Nevada 2.4 1.5 False
#five 2002 Nevada 2.9 -1.7 False
#six 2003 Nevada 3.2 NaN False
Method | Description |
---|---|
append | Concatenate with additional index objects, producing a new index |
difference | Compute set differnece as an index |
intersection | Compute set intersection |
union | Compute set union |
isin | Compute boolean array indicating whether each value is contained in the passed collection |
delete | Compute new index with element at index i deleted |
drop | Compute new index by deleting passed values |
insert | Compute new index by inserting element at index i |
is_monotonic | Returns True if each element is greater than or equal to the previous element |
unique | Compute the array of unique values in the index |
Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an Index Object
obj = pd.Series(range(3),index=['a','b','c'])
index = obj.index
index
#Out: Index(['a','b','c'],dtype='object')
- index[1] = ‘d’ #Type Error
- Immutability makes it safer to share Index objects among data structures
Reindexing - rearrange the data for pandas data structures according to the new index, return a new structure
obj2 = obj.reindex(['a','b','c','d','e']) #this returns a new Series or DataFrame with new index
obj = pd.Series(np.arange(5.), index=['a','b','c','d','e'])
new_obj = obj.drop('c')
new_obj = obj.drop(['d','e'])
# Out:
# a 0.0
# b 1.0
data = pd.DataFrame(np.arange(4).reshape((2,2)),index=['ohio','Colorado'],columns=['one','two'])
# Out:
# one two
# ohio 0 1
# Colorado 2 3
data.drop('two',axis=1,inplace=True)
- A set is an unordered collection of unique elements
- L = [1,2,3,3,1,2,5] s = list(set(L))
- (?P=quote) \1
- m.group(‘quote’) m.end(‘quote’)
Isaac(?=Asimov) will match ‘Isaac’ only if it’s followed by “Asimov’
- Isaas(?!Asimov) will match “Isaac’ only if it is not followed by ‘Asimov’
re.compile(pattern,flags=0) compiles a regular expression pattern into a re object, which can be used in match() and search() methods
- prog = re.compile(pattern) #more efficient when the same pattern would be used several times result = re.match(string) OR
- result = re.match(pattern,string)
re.search(pattern,string,flags=0) scan through string looking for the fisrt location where the regular expression matches and returns a match object
- returns None if not found
re.match(pattern,string,flags=0) if 0 or more chars at the beginning of string match, returns the match object
- only match the beginning (even in MULTILINE mode)
- So if the match might be anywhere in the string, use search
re.findall(pattern,string,flags=0) return all non-overlapping matches of pattern in a string in list
Always have a boolean value of True if there is a match (since None is returned when no match is found)
- m = re.match(r”(\w+)\s(\w+)”, “Isaac Newton, Physicist”) m.group(0) #”Isaac Newton” The entire match m.group(1) #”Isaac” The first parenthesized subgroup m.group(2) #”Newton” The second parenthesized subgroup
- m = re.match((?P<first_name>\w+) (?P<last_name>)\w+)’,’Guanduo Li’) m.group(‘first_name’) #or m.group(1) m.group(‘last_name’) #or m.group(2) notice m.group(0) returns the entire match not parenthesized match
- m.groups() returns all matches in tuple
- m.groupdict returns a dictionary containing all
named
subgroups of matched (MUST BE NAMED)
- a = 3 ‘a’ in globals() #must ’ ’
- try: a except: print “not defined”
- L = L[:]
- M.extend(L)
- a = ‘0xf’ #string d = int(a,16) #to hex
- from __future__ import print_function
- import sys sys.exit()
- import sys len(sys.argv) #this is a list. sys.argv[0] stores the file name of the current running script
os.path.basename(path)
os.getcwd()
os.path.islink #find if it's a link
os.path.isdir #find if it's a dir
files = list(filter(lambda x:os.path.isdir(x),glob.glob(path+"*") #get all dir
dirs.sort(key=lambda x:os.path.getmtime(x)) #sort the list through modified time
import os.path.exists
- __main__ is the namespace at the top. If a module is run directly (top), __name__ will be set to “__main__”, otherwise, it stores the module’s name
- __name__ stores the current namespace
- __name__ == “__main__” if at top
It’s convenient to have code at the bottom of a module for testing run only, not when the module is imported:
- if __name__ == ‘__main__’: #testing code for current module and these code won’t be run when imported
- dir() knows how to grab all attributes of an object through __dict__. It also grabs all inherited attributes of this object! (all availables)
- __dict__ only contains “local” sets of attributes
Use str1.find(str2) to find if str2 is a substring of str1. Return the index of first match or -1 if no match
- a = [[1,2],[3,4],[5,5]] a_flat = sum(a,[]) #use overload of +
- The second argument of sum is the initial value used as the first operand before the first + treat it as []+[1,2]+[3,4]+[5,5], which gives a flatterned list: [1,2,3,4,5,5]
Once all of the arguments are defined, you can parse the command line by passing a sequence of argument strings to parse_args().
The options are processed using the GNU/POSIX syntax, so option and argument values can be mixed in the sequence.
parser = argparse.ArgumentParser(description='Short sample app')
Action | Name |
---|---|
store | Save the value. Can optionally convert type if type is defined |
store_const | Save a value defined as part of the argument specification, rather than a value that comes from the arguments being parsed |
store_true | boolean True |
store_false | boolean False |
append | save the value to the list. Multiple values are saved if the argument is repeated |
append_const | Save a value defined in the argument specification to a list |
version | Prints version details about the program and then exits |
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-s', action='store', dest='simple_value',
help='Store a simple value')
parser.add_argument('-c', action='store_const', dest='constant_value',
const='value-to-store',
help='Store a constant value')
parser.add_argument('-t', action='store_true', default=False,
dest='boolean_switch',
help='Set a switch to true')
parser.add_argument('-f', action='store_false', default=False,
dest='boolean_switch',
help='Set a switch to false')
parser.add_argument('-a', action='append', dest='collection',
default=[],
help='Add repeated values to a list',
)
parser.add_argument('-A', action='append_const', dest='const_collection',
const='value-1-to-append',
default=[],
help='Add different values to list')
parser.add_argument('-B', action='append_const', dest='const_collection',
const='value-2-to-append',
help='Add different values to list')
parser.add_argument('--version', action='version', version='%(prog)s 1.0')
results = parser.parse_args()
print 'simple_value =', results.simple_value
print 'constant_value =', results.constant_value
print 'boolean_switch =', results.boolean_switch
print 'collection =', results.collection
print 'const_collection =', results.const_collection
$ python argparse_action.py -h
usage: argparse_action.py [-h] [-s SIMPLE_VALUE] [-c] [-t] [-f] [-a COLLECTION] [-A] [-B] [–version]
optional arguments: -h, –help show this help message and exit -s SIMPLE_VALUE Store a simple value -c Store a constant value -t Set a switch to true -f Set a switch to false -a COLLECTION Add repeated values to a list -A Add different values to list -B Add different values to list –version show program’s version number and exit
$ python argparse_action.py -s value
simple_value = value constant_value = None boolean_switch = False collection = [] const_collection = []
$ python argparse_action.py -c
simple_value = None constant_value = value-to-store boolean_switch = False collection = [] const_collection = []
$ python argparse_action.py -t
simple_value = None constant_value = None boolean_switch = True collection = [] const_collection = []
$ python argparse_action.py -f
simple_value = None constant_value = None boolean_switch = False collection = [] const_collection = []
$ python argparse_action.py -a one -a two -a three
simple_value = None constant_value = None boolean_switch = False collection = [‘one’, ‘two’, ‘three’] const_collection = []
$ python argparse_action.py -B -A
simple_value = None constant_value = None boolean_switch = False collection = [] const_collection = [‘value-2-to-append’, ‘value-1-to-append’]
$ python argparse_action.py –version
argparse_action.py 1.0
marshal #used to serialize/de-serialize data to and from character strings, so they can be sent over a network. Use simple dump/load calls
defaultdict #one more feature based on regular dict: if a key doesn’t exsit, can implemnt a callback (i think)
bisect.bisect(a,val,lo=0,hi=len(a)) returns i, where a[0:i] <= val. NOTE: this is greedy! C++ lower_bound is not greedy: i.e returns first element >= val
a = [1,2,3,4,5]
i = bisect.bisect(a,3) #gives i == 3 -> a[0:3] = [1,2,3], but a[3] == 4
#how to understand this? same as for loop. The first element is included, but last element is not included
sorted(iterables,*,key=None,reverse=False)
key specifies a function of one argument that is used to extract a comparison key from each list element.
def takeSecond(ele):
return ele[1]
random = [(2,4),(1,5)]
#sorted will pass each item (tuple in this case) to takeSecond function, then sort by key, then sort in descending order
sorted list = sorted(random,takeSecond,reverse=True)
Note this is different from C++ std::sort as C++ use a function that return bool by telling by using operator<.
subprocess.call(['ls','-l'])