Skip to content

Instantly share code, notes, and snippets.

@yati-sagade
Created May 14, 2012 13:49
Show Gist options
  • Save yati-sagade/2694094 to your computer and use it in GitHub Desktop.
Save yati-sagade/2694094 to your computer and use it in GitHub Desktop.
not completely up-to-date, but more-or-less okay
Cloud.js
Cloud.js aims to be a compute-cloud implementation where users can submit jobs to be executed possibly in parallel on the server. The concept is similar to that of PiCloud.
The user first registers and obtains an API key and the client library.
From hereon, a request from the client library on behalf of the user or directly from the user will be assumed to be qualified with the mentioned API key, unless noted otherwise.
The client first submits a "job" using the HTTP(S) POST method on the RESTful interface exposed by the server. This can be done directly or using the client library.
A job consists of:
The routine/function describing the job. This is an ordinary JavaScript function, and must be of the form
function(){
//body
}
Since JavaScript does not have a bytecode, raw function body will be transmitted over HTTPS.
The argument(s) to be passed to the function.
The "global context" of the function. This one may be tricky. A function in JS that accesses variables from the enclosing scope is not uncommon. Sometimes, it is desired to modify variables from the enclosing scope. To provide for this, a "sandbox" of variables can be submitted along with the job function.
For example, consider this code:
var x = 0, str = "hello";
(function(name){
++x;
return str + " " + name;
})("Lama");
Everytime that function is called, it'll increment x and return a greeting for the name passed in as it's argument. This works locally, as the function's enclosing scope has those variables defined. But if we send this function over the network, we no more have x and str defined. To achieve what we want, we can, with the function, send a context, which is just a JS object mapping variable names to their values.
{"x": 0,
"str": "hello"}
Now, whenever the function makes a reference to a variable not in its own scope, they're pulled from the context object passed. So this code will work as intended. Also, the modified context object is returned to the caller(client library/user) as a JSON object, which reflects any modifications that the function might have performed. So, after calling this function once remotely, this should be returned:
{"x": 1,
"str": "hello"}
Notice that x was incremented in the function and the change is reflected in the context.
The job object should look like:
{ "func": "function(){...}",
"ctx": {...},
"args": [arg0, arg1, arg2, ...] }
The args property MUST be a list(array) of lists.
Passing the args
As mentioned, the args property should be a list of lists. Why so? Here's why:
Each inner list in the args list (i.e., args[0], args[1]..., args[args.length-1]) is treated as a argument list for the function, and if possible, the function is called in parallel with these argument lists. Suppose we pass an args property that looks like:
[ [1,2,3], ["one", "two", "three"] ]
and suppose the function is `f()', the server will make the calls f(1,2,3) and f("one", "two", "three") - possibly in parallel – So, in effect, for each inner list i of the args property, the given function f() is called effectively setting the "arguments" property of f()'s invocation to i – or in other words, the elements in i are passed one by one as positional args to f().
RESTful endpoints
The server will provide (at the minimum) one RESTful endpoint for submission of jobs, one for retrieval of results and one for tracking the statuses of submitted jobs.
The result
The result will be a JSON response, that looks like:
{"ret": <the return value of the the function>,
"ctx": {//The (possibly modified) context}
}
Encoding
The job will be submitted via HTTP POST. So, we need to be URL safe. Also, there's Node.js (async) i the backend, and we'll be using delimiters "[BEGIN]" and "[END]" . Since these characters may appear in the job JSON text also, the following way is used to encode the job object:
job_str = job.toString()
job_b64_encoded = base64_encode(job_str)
job_b64_url_encoded = url_encode(job_b64_endoded) // since the "=" char is valid in base64
// encoded strings but not in URL as
// a part of a URI component.
Then a POST can be made to the submission endpoint with the POST param "job" set to a string encoded in the above manner.
Client server flow
==================
The client-server interaction MUST require TLS, as it involves the transport of the API
key in cleartext.
1) The client sends the job encoded in the above manner to the server using an HTTP POST
request, with the following mandatory parameters(without the quotes):
"job" - the urlencoded base64 encoded version of the job object,
"api_key" - the API key obtained during registration.
2) The server passes the job on to the cloud for processing. Note that the job results are
NOT returned as a response to the original job submission request. The response to the original
request is a JSON response contains the "job_id" key which is what it looks like it is.
3) When the client wants the job results, it can send a POST request with a mandatory parameter
"api_key" to /result/<job_id>/
If the job is still being processed/an error occurred, a JSON response with a key "error" and a
value describing the error will be returned.
If the job is done processing, a JSON response of the following form is returned:
{
"ret": <the return value of the function>
"ctx": <the (possibly modified) context>
}
Server-side architecture
========================
1) The server receives a job
2) It records the job in the main database as a document having the following structure.
Before storing, the controller node to assign the job to is chosen.
{
"job_id": string,
"job": document,
"status": integer,
"owner": string,
"submitted_at": datetime,
"finished_at": null,
"result": { } // empty doc
"assigned_to": string
"remaining": job.args.length
}
3) On the Redis server, perform the LPUSH operation on a list having the name same as the
controller with the string representing this JSON object:
{
"job_id": <the job id>
"job": <the job object>
}
Controller
----------
1) At startup, the "name" of a controller must be added to the controllers database of the
server.
2) While up the controller does:
a) call a BRPOP on the Redis list representing this controller.
b) get a job. Prepare a Redis list on the LOCAL_REDIS_SERVER named as the job id.
c) whenever a POST from a worker is received, it'll contain
{job_id, fragment_id, result(encoded)}. The controller updates the "result" property
of the job document in the job DB by inserting into it the (key, value) pair
(fragment-id, result(decoded)) AND decrement the "remaining" property.
Then, since we have a free worker, call BRPOP again on the job queue and repeat from a.
{"job": {"func": "function(){return x++;}", "ctx": {"x": 1}, "args": [[]]},"job_id": "1234"}
Deployment
==========
This turned out to be the most challenging part :\
(For the server) We use:
Gunicorn as the server.
Supervisor to manage the Gunicorn process.
Nginx as a reverse proxy to handle the actual connections on port 80.
Cloned the cloud.js repo into /opt/cloud.js.run/ --> as a working copy.
created a venv and activated it.
(cloudjs)$ cd /opt/cloud.js.run/cloud-server/
(cloudjs)$ pip install -r requirements.txt
setup:
created a script deploy.sh
#!/bin/bash
set -e
LOGFILE=/home/yati/log/gunicorn/cloudjs.log
LOGDIR=$(dirname $LOGFILE)
NUM_WORKERS=1
USER=yati
GROUP=yati
cd /opt/cloud.js.run/cloud-server
source /home/yati/Virtualenvs/cloudjs/bin/activate
test -d $LOGDIR || mkdir -p $LOGDIR
gunicorn_django -w $NUM_WORKERS \
--user=$USER --group=$GROUP --log-level=debug \
--log-file=$LOGFILE 2>>$LOGFILE
This will be run as my user, my group. NUM_WORKERS is the number of subprocesses
Gunicorn should spawn to serve requests.
supervisor should be installed, as it is in the requirements file.
In the /etc/ file under the VirtualEnv directory(in my case ~/Virtualenvs/cloudjs/etc/)
add a file supervisord.conf
[program:cloudjs]
directory = /opt/cloud.js.run/cloud-server
user = yati
command = /opt/cloud.js.run/cloud-server/deploy.sh
stdout_logfile = /home/yati/log/gunicorn/cloudjs.log
stderr_logfile = /home/yati/log/gunicorn/cloudjs.log
[unix_http_server]
file=/tmp/supervisor.sock ; (the path to the socket file)
;chmod=0700 ; socket file mode (default 0700)
;chown=nobody:nogroup ; socket file uid:gid owner
;username=user ; (default is no username (open server))
;password=123 ; (default is no password (open server))
[supervisord]
logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
This tells supervisor about our program "cloudjs" and now we can run supervisord as root
(cloudjs)# supervisord
This will start the supervisor daemon process.
Now, as root, we can start/stop/restart our Gunicorn project like so:
(cloudjs)# supervisorctl start cloudjs
(cloudjs)# supervisorctl stop cloudjs
Gunicorn uses port 8000 by default and that is cool. We want Nginx on 80 now.
# yum install nginx
Then, edited /etc/nginx/conf.d/default.conf to make it look like this:
#
# The default server
#
server {
listen 80;
# server_name _;
# start mine
server_name localhost;
root /opt/cloud.js.run/cloud-server;
access_log /home/yati/log/nginx_access.log;
error_log /home/yati/log/nginx_error.log;
location /static/{
autoindex on;
root /opt/cloud.js.run/cloud-server/server;
}
# end mine
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
# begin for cloud.js
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_connect_timeout 10;
proxy_read_timeout 10;
proxy_pass http://localhost:8000/;
# end cloud.js
# root /usr/share/nginx/html;
# index index.html index.htm;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/html;
}
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
#
#location ~ \.php$ {
# proxy_pass http://127.0.0.1;
#}
# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#
#location ~ \.php$ {
# root html;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_index index.php;
# fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;
# include fastcgi_params;
#}
# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#
#location ~ /\.ht {
# deny all;
#}
}
That is it. Starting nginx now as root and pointint to http://localhost/ renders
the lovely homepage :)
#
# The default server
#
server {
listen 80;
return 301 https://172.16.58.63/;
}
server {
listen 443 ssl;
# server_name _;
# start mine
ssl on;
ssl_certificate /opt/cloud.js.run/cert/cloudjs-cert.pem;
ssl_certificate_key /opt/cloud.js.run/cert/cloudjs-key.pem;
ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!MD5;
server_name localhost;
root /opt/cloud.js.run/cloud-server;
access_log /home/yati/log/nginx_access.log;
error_log /home/yati/log/nginx_error.log;
location /static/{
autoindex on;
root /opt/cloud.js.run/cloud-server/server;
}
# end mine
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
# begin for cloud.js
#rewrite (.*) https://127.0.0.1/ permanent;
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_connect_timeout 10;
proxy_read_timeout 10;
proxy_pass http://localhost:8000/;
# end cloud.js
# root /usr/share/nginx/html;
# index index.html index.htm;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/html;
}
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
#
#location ~ \.php$ {
# proxy_pass http://127.0.0.1;
#}
# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#
#location ~ \.php$ {
# root html;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_index index.php;
# fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;
# include fastcgi_params;
#}
# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#
#location ~ /\.ht {
# deny all;
#}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment