- Kavya Joshi: A tale of concurrency through creativity in Python ** https://www.youtube.com/watch?v=GunMToxbE0E
- Raymond Hettinger: Keynote on Concurrency PyBay 2017 ** https://www.youtube.com/watch?v=9zinZmE3Ogk
- Larry Hastings: Python's Infamous GIL ** https://www.youtube.com/watch?v=P3AyI_u66Bw
Options:
- multiprocessing
- threading
- event based programming
- gevents
import multiprocessing as mp
def downloader():
pool = []
for user in users:
p = mp.Process(download_photo, user)
pool.append(p)
p.start()
for p in pool:
p.join()
- Each network request is still blocking, but only blocks process for that user.
- Gives parallellism, multi-core, Concurrency
- High overhead on processes
import threading
def downloader():
pool = []
for user in users:
t = threading.Thread(download_photo, user)
pool.append(t)
t.start()
for t in pool:
t.join()
- Lighter weight than processes
- gives concurrency
- multi-Threaded programming is hard, writing correct code is difficult, troubleshooting more troublesome
- another problem: CPython & GIL (Global Interpreter Lock) ** GIL: -- Larry Hastings: https://www.youtube.com/watch?v=P3AyI_u66Bw
- twisted
import twisted
- code becomes very complicated
- write loop
- wire call backs to loops etc
- great for I/O bound apps that need to be highly concurrent
- they are user space, OS does not create / schedule them
- they are cooperatively scheduled
- extremely lightweight compared to threads
- 20-30K concurrent connections, threads do not give you this b/c of memory overhead
- used at "web scale" at : ** Pinterest, Facebook, PayPal, Disqus, ...
import gevent
from gevent import monkey
monkey.patch_all()
def downloader():
pool = []
for user in users:
g = gevent.Greenlet(download_photo, user)
g.start()
pool.append(g)
gevent.joinall(pool)
- api is exposed as asyncronous api
from greenlet import greenlet
gr1 = greenlet(print_red)
gr2 = greenlet(print_blue)
gr1.switch()
def print_red():
print('red')
gr2.switch() # switches to funtion print_blue
print('red done')
def print_blue():
print('blue')
gr1.switch() # switches to print_red function, but does not re-run, "resumes", so 'red_done' is printed
print('blue_done')
- .switch() did: ** pause current + yield control flow to the next greenlet ** next time switch is called, it was resumed: next.switch()
- greenlet is written in C
- every greenlet has a parent
- gevent uses greenlets, for coroutines via assembly-based stack slicing to get cooperative execution units
- gevent uses libev, high-performance event loop written in C ** libev gives you an API to register event_handler callbacks ** libev's event loop watches for events ** when event occurs, libev calls registered callbacks
g = gevent.Greenlet(download_photo, user)
- gevent initiates Greenlet class
- Class initialization instanciates a small greenlet and it sets its parent to 'Hub' greenlet
- Hub is where the looping happens
class Greenlet(greenlet):
def __init__(self, run=None,...)
greenlet.__init__(self, None, get_hub())
where
get_hub sets -> g.parent = Hub
- Hub is the greenlet that runs the event loop, 1 in a thread
Greenlet()
creates two things: ** a greenlet for our function (download_photos) ** Sets its .parent to the event loop (i.e. Hub) greenlet
g.start()
registers its switch funtion to event loop.
self.parent.loop.run_callback(self.switch)
becomes
Hub.loop.run_callback(self.switch)
# this is registered as pre_block_watcher (run it before you block)
gevent.joinall(pool)
runs the loop: it switches to Hub loopgevent.join()
is the short version
from gevent import monkey
monkey.patch_all()
- What above code does is on the fly it replaces standard libraries, e.g.
socket
, withgevent.socket
- monkey patching makes libraries co-operative, non-blocking
- when the blocking call (
socket
)is made, it registers it into loop (Hub
)and runs the loop - gevent gives us non/blocking I/O
- gevent does not give you parallellism
- non-cooperative code will block the entire proces ** C-extensions (e.g. database drivers) *** -> use pure Python libraries (can take advantage of greenlets) ** compute-bound greenlets (can hog cpu) *** -> use gevent.sleep(0) *** -> use greenlet blocking detection
- monkey-patch may have confusing implications ** order of imports matter!