Env
- test notebook
- Xeon Skylake (36 cores, 72 threads) >$7k (note: the link has half cores)
- Quadro P1000 $300
Model
- densenet121: 8M parameters
- feature image: 224x224x3, train epoch: 352, eval epoch: 40
Result
- Intel takes 975.458 secs for one epoch training and one epoch validation
- Nvidia takes 382.901 secs for one epoch training and one epoch validation
- Nvidia is 2.55x faster although Nvidia is >10x cheaper.
Note: when Nvidia is used, about 30 cpu threads work in %27 utilization to distribute and aggregate(?) tasks.
Note: GeForce GTX 1080 is 3x better FLOP than Quadro P1000 :( according to CUDA benchmark