I was helping a few computer science students and enthusiasts understand “how” modern processors got to be “so fast” outside of clock speed increases.
Here is the main
;p exert
SIMD: Single Instruction, Multiple Data
Compiling/installing the mesa virtio-venus-driver-(below done with new linux container) | |
For: Chrome OS crostini-default debian container bookworm | |
Best viewed in "raw" format | |
In chrome browser type or paste | |
chrome://flags |
Imagination Creator CI20 board: A MIPS32 architecture evaluation board.
$ ./benchmark/benchncnn | |
thread_policy_set error 46 | |
loop_count = 4 | |
num_threads = 8 | |
powersave = 0 | |
gpu_device = -1 | |
cooling_down = 1 | |
squeezenet min = 5.64 max = 6.24 avg = 5.88 | |
squeezenet_int8 min = 8.93 max = 8.97 avg = 8.94 | |
mobilenet min = 8.86 max = 8.99 avg = 8.91 |
Model | Image Size | Target Size | Block Size | Total Time(sec) | GPU Memory(MB) |
---|---|---|---|---|---|
models-cunet | 200x200 | 400x400 | 400/200/100 | 0.93/0.30/0.33 | 615/615/173 |
models-cunet | 400x400 | 800x800 | 400/200/100 | 0.78/0.71/0.78 | 2408/615/174 |
models-cunet | 1000x1000 | 2000x2000 | 400/200/100 | 3.16/3.21/3.53 | 2416/618/175 |
models-cunet | 2000x2000 | 4000x4000 | 400/200/100 | 11.40/11.98/13.86 | 2420/669/193 |
models-cunet | 4000x4000 | 8000x8000 | 400/200/100 | 44.33/47.15/54.76 | 2452/644/197 |
models-upconv_7_anime_style_art_rgb | 200x200 | 400x400 | 400/200/100 | 0.16/0.16/0.15 | 459/459/119 |
models-upconv_7_anime_style_art_rgb | 400x400 | 800x800 | 400/200/100 | 0.43/0.37/0.37 | 1741/460/119 |
models-upconv_7_anime_style_art_rgb | 1000x1000 | 2000x2000 | 400/200/100 | 1.62/1.59/1.67 | 1764/462/120 |
NCNN adopts the factory pattern to create the layers of a nueral network. It's also the way the well-known library Caffe takes. It differs from Caffe in the implementation of the registry table. On one hand, the Caffe registry is populated in runtime as the side effect of initializion of global variable (which is a popular way for library initialization). On the other hand, the NCNN registry is determined in compile time. The registry is generated in a brilliant way using CMake instead of a hand-crafted table. NCNN's approach provides several benefits compared to Caffe's approach.
First, it's suitable for building a static library. When building a static library, the linker will strip any unused global variable to minimize the size of the library. This makes sense but it also strips the global variable which need to be inintialized to insert te layer creator into the registry. Tricky linker flags and related instrutions are required to resolve this issue. By creating
/*! | |
* Copyright (c) 2019 by Contributors | |
* \file op.h | |
* \brief definition of all the operators | |
* \author Chuntao Hong, Xin Li | |
*/ | |
#ifndef MXNET_CPP_OP_H_ | |
#define MXNET_CPP_OP_H_ |
Follow the WORKAROUND: | |
1. Add a comand to /etc/rc.local, add the following line above "exit 0": | |
setpci -s 00:1c.2 0x50.B=0x41 | |
2. Add the same comand to /etc/apm/resume.d/21aspm (which does not exist yet): | |
setpci -s 00:1c.2 0x50.B=0x41 | |
3. Add the following to /etc/modprobe.d/sdhci.conf: | |
options sdhci debug_quirks2=4 | |
4. Re-generate initrd: | |
sudo update-initramfs -u -k all | |
5. Reboot or reload sdhci module: |