Build the following and make it run as fast as you possibly can using Python 3 (vanilla). The faster it runs, the more you will impress us!
Your code should:
- Download this 2.2GB file: https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv
- Count the lines in the file
- Calculate the average value of the tip_amount field.
All of that in the most efficient way you can come up with.
That's it. Make it fly!
import urllib
testfile = urllib.URLopener()
testfile.retrieve(\
"https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv",\
"yellow_tripdata_2016-01.csv")
(NOTE: I didn't include the download block in the benchmark due to network speed impact)
root@ubuntu-1gb-fra1-01:~# time python3 main.py
10906858 1.7506631158122512
real 0m40.243s
user 0m37.004s
sys 0m2.140s
root@ubuntu-1gb-fra1-01:~# uname -a
Linux ubuntu-1gb-fra1-01 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu-1gb-fra1-01:~# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping : 1
microcode : 0x1
cpu MHz : 2199.998
cache size : 30720 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch vnmi ept fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt arat
bugs :
bogomips : 4399.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
root@ubuntu-1gb-fra1-01:~# cat /proc/meminfo
MemTotal: 1016156 kB
MemFree: 68416 kB
MemAvailable: 818396 kB
Buffers: 1152 kB
Cached: 872720 kB
SwapCached: 0 kB
Active: 457612 kB
Inactive: 441676 kB
Active(anon): 28232 kB
Inactive(anon): 2728 kB
Active(file): 429380 kB
Inactive(file): 438948 kB
Unevictable: 3656 kB
Mlocked: 3656 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 29072 kB
Mapped: 12020 kB
Shmem: 3124 kB
Slab: 30380 kB
SReclaimable: 18772 kB
SUnreclaim: 11608 kB
KernelStack: 1840 kB
PageTables: 2160 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 508076 kB
Committed_AS: 202332 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 4096 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 53240 kB
DirectMap2M: 995328 kB
DirectMap1G: 0 kB