We've learned SyncWorker
for gunicorn in part1, now let's see how other workers work
If you visite the official site of eventlet
Eventlet is a concurrent networking library for Python that allows you to change how you run your code, not how you write it.
- It uses epoll or kqueue or libevent for highly scalable non-blocking I/O.
- Coroutines ensure that the developer uses a blocking style of programming that is similar to threading, but provide the benefits of non-blocking I/O.
- The event dispatch is implicit, which means you can easily use Eventlet from the Python interpreter, or as a small part of a larger application.
EventletWorker
inherit from AsyncWorker
, it override the init_process
method and run
method
def patch(self):
hubs.use_hub()
eventlet.monkey_patch()
patch_sendfile()
def init_process(self):
self.patch()
super().init_process()
After fork
from the master process, the init_process
calls eventlet.monkey_patch()
, which replace the following modules by the corresponding eventlet
support module by default
for name, modules_function in [
('os', _green_os_modules),
('select', _green_select_modules),
('socket', _green_socket_modules),
('thread', _green_thread_modules),
('time', _green_time_modules),
('MySQLdb', _green_MySQLdb),
('builtins', _green_builtins),
('subprocess', _green_subprocess_modules),
]
Eventlet replaced the default IO module by it's green
module, when you calls the socket
function, you are actually calling _green_socket_modules
, which implements nonblocking IO
On every socket
read/write, or time.sleep
, it actually save the current context and add the current gthread to the pooling list, and then calls pool to wait for next ready IO event
It's like the async
keyword in python3, but with less code invasion
If you run your app in eventlet mode
gunicorn --workers 2 --worker-class eventlet mysite.wsgi
EventletWorker
will spawn a new gthread
, which in charge of accept connection from socket, after accept a new connection from socket, the gthread
pass the django handle function to the greenpool
, and use the greenpool
to start the django function
Thanks for eventlet
, we can simply change --worker-class
to make our django application blocking IO to nonblocking IO
Compare to define async
function directly, your code can run both in blocking and nonblocking mode, and easier to debug
But defining async
function with async
keyword directly, require you to design your code in async
style from the top down, gives you more power about async
control. For example, eventlet
with django parallel two different request, while async
function is able to parallel different IO operation in the same request
If you visite the official site of gevent
gevent is a coroutine -based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libev or libuv event loop.
gevent is inspired by eventlet but features a more consistent API, simpler implementation and better performance.
The differences
- gevent is built on top of libevent(since 1.0, gevent uses libev and c-ares.)
- Signal handling is integrated with the event loop.
- Other libevent-based libraries can integrate with your app through single event loop.
- DNS requests are resolved asynchronously rather than via a threadpool of blocking calls.
- WSGI server is based on the libevent’s built-in HTTP server, making it super fast.
- gevent’s interface follows the conventions set by the standard library
- gevent does not have all the features that Eventlet has.
If you had another library (written in C) that used libevent’s event loop and want to integrate them together in a single process, gevent support while eventlet does not
Let's go back to gunicorn
GeventWorker
inherit from AsyncWorker
, it also override the init_process
method and run
method
def patch(self):
monkey.patch_all()
def init_process(self):
self.patch()
hub.reinit()
super().init_process()
After fork
from the master process, the init_process
calls gevent.monkey()
, which replace the following modules by the corresponding gevent
support module
def patch_all(socket=True, dns=True, time=True, select=True, thread=True, os=True, ssl=True,
subprocess=True, sys=False, aggressive=True, Event=True,
builtins=True, signal=True,
queue=True, contextvars=True,
**kwargs):
pass
The pattern is similar to eventlet, the interface is different, so the actual function being called in run
is slightly different
# gunicorn/workers/ggevent.py
from gevent.pool import Pool
from gevent.server import StreamServer
def run(self):
# ...
pool = Pool(self.worker_connections)
# ...
server = StreamServer(s, handle=hfun, spawn=pool, **ssl_args)
# ...
server.start()
If you run
gunicorn --workers 2 --worker-class eventlet mysite.wsgi
The pros and cons of using gevent
is the same as eventlet
, we are not repeating it again
If you focus more on performance, or you've C lib that use libevent’s(or libev) event loop that want to integrate into Python in a single process, consider using gevent
If you rely on some specific features on eventlet
such as eventlet.db_pool
or eventlet.processes
, you probably should keep using eventlet
By default gunicorn
use the sync
mode, It prefork workers
number of process and each worker is able to handle one request at a time
ThreadWorker
inherit from Worker
, it also override the init_process
method and run
method
def init_process(self):
self.tpool = self.get_thread_pool()
self.poller = selectors.DefaultSelector()
self._lock = RLock()
super().init_process()
def enqueue_req(self, conn):
conn.init()
# submit the connection to a worker
fs = self.tpool.submit(self.handle, conn)
self._wrap_future(fs, conn)
def accept(self, server, listener):
try:
sock, client = listener.accept()
# initialize the connection object
conn = TConn(self.cfg, sock, client, server)
self.nr_conns += 1
# enqueue the job
self.enqueue_req(conn)
except EnvironmentError as e:
if e.errno not in (errno.EAGAIN, errno.ECONNABORTED,
errno.EWOULDBLOCK):
raise
def run(self):
# ....
We can see that init_process
create a thread pool, and accept
just push the established connection to the queue
inside the ThreadPool
object
- If there is a concern about the application memory footprint, using
threads
and its corresponding gthread worker class in favor ofworkers
yields better performance because the application is loaded once per worker and every thread running on the worker shares some memory, this comes to the expense of some additional CPU consumption.
Let's see an example
gunicorn --workers 1 --worker-class gthread --threads 2 mysite.wsgi
The --threads
will only affect gthread
worker class, other worker class will not be affected by --threads
parameter
Each worker initialize a ThreadPool
with size --threads
threads, whenever the main thread accept a socket object, the object is pushed into the queue
, and the working thread in ThreadPool
will pop it from the queue
and delegate the actual request to django application
The last worker class is tornado
, the code is pretty simple
# gunicorn/gunicorn/workers/gtornado.py
def init_process(self):
# IOLoop cannot survive a fork or be shared across processes
# in any way. When multiple processes are being used, each process
# should create its own IOLoop. We should clear current IOLoop
# if exists before os.fork.
IOLoop.clear_current()
super().init_process()
def run(self):
# ...
The run
method initlaize monitor utility in gunicorn
, start a tornado server instance, bind the listening sockets to the tornado server, and finally runs the IOLoop
- what are you using gevent for?
- Comparing gevent to eventlet
- Better performance by optimizing Gunicorn config