├── README.md
├── _config.yml
├── android-demo
    ├── AndroidManifest.xml
    ├── Camera.java
    ├── Decoder.java
    ├── Encoder.java
    ├── MainActivity.java
    ├── README.md
    ├── main.c
    ├── make
    ├── res
    │   └── drawable
    │   │   └── icon.png
    ├── shaders.c
    └── shaders.h
├── benchmark.c
├── benchmark.py
├── bnn.py
├── c_ops.h
├── c_ops_neon.h
├── cifar_bnn.py
├── index.md
├── rpi-demo
    └── names2.h
├── test_xnornet.c
├── tf_export.py
├── util.c
├── util.h
└── xnornet.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Binary networks from tensorflow to embedded devices
  2 | 
  3 | The goal of this project is to provide a way to take models trained in tensorflow and export them to a format that is suitable for embedded devices: C code and a flat binary format for the weights. This project has a special focus on binary networks and quantization because of their relevance for embedded systems.
  4 | 
  5 | ## bnn module
  6 | 
  7 | This module provides helper functions for creating binary models, namely a binary activation function and a useful `layer` function which includes weight binarization, batch normalization, pooling and activation.
  8 | 
  9 | The binary weight training is implemented as in [BinaryNet](https://github.com/MatthieuCourbariaux/BinaryNet).
 10 | 
 11 | ## tf_export module
 12 | 
 13 | This module provides an export function which generates C code and a weight file from a tensorflow model. It also implements a few optimizations:
 14 | 
 15 | * Detect binary activations and binary weights created with helper functions
 16 | * Combine linear operations between layers (ex: bias addition and batch norm)
 17 | * Convert linear operations preceeding a binary activation to thresholds
 18 | * Layers with binary input and binary weights use fast binary convolutions
 19 | * Optional 8-bit quantization for other layers
 20 | 
 21 | ## Binary weights
 22 | 
 23 | Weight binarization reduces the size of weights by a factor of 32 when compared to 32-bit floating point weights. For most models this results in reduced performance but for some models the loss can be minimal.
 24 | 
 25 | Technically convolutions with binary weights require less operations but they can be slower on modern CPUs because additional operations are needed to debinarize the weights (although this is mitigated by the reduced number of memory loads).
 26 | 
 27 | ## Binary convolution
 28 | 
 29 | Binary convolutions are possible when both the input and weights are binary. 1-bit multiplications are implementated with XOR operations and accumulated with bitcount operations. This leads to very fast operation. For example, a single SIMD instruction on ARM can perform 128 such 1-bit multiplications.
 30 | 
 31 | ## Quantization
 32 | 
 33 | The quantization implemented in this project is relatively simple, and very much like [this](https://www.tensorflow.org/performance/quantization). It is applied after training unlike the binary weights and activations which require modifications at training time.
 34 | 
 35 | Quantization can be especially interesting in the case of binary weights because the precision loss going from 32-bit float input to 8-bit quantized input is very low when combined with the low precision weights. This allows a compromise somewhere between full binarization and binary weights.
 36 | 
 37 | ## Special convolutions
 38 | 
 39 | There are 4 types of "special" convolutions supported by this project:
 40 | 
 41 | * **Int8**: The quantized convolution. Uses 8-bit inputs and 8-bit weights and outputs 32-bit integer values.
 42 | * **Float with binary weights**: The "BinaryConnect" / "BWN" convolution. Drastically reduces weight size but can be slower than regular convolution on modern CPUs (the weights can be decompressed in that case but then the memory usage remains high).
 43 | * **Int8 with binary weights**: The quantized version of the "BinaryConnect" / "BWN" convolution. The weight size is the same but the speed is better.
 44 | * **Binary**: The "BinaryNet" / "XNORnet" convolution. Very fast.
 45 | 
 46 | ### Implementation
 47 | 
 48 | The convolution function is optimized for the case of a fully-connected layer with a batch size of 1. "Fast" versions of these operations are implemented using NEON instrinsics for ARM CPUs.
 49 | 
 50 | * There is a significant (to the order of 25%) improvement that could be gained using assembly implementations.
 51 | 
 52 | ### Analysis
 53 | 
 54 | 2 ops = 1 equivalent multiply-add operation
 55 | 
 56 | | Convolution type | Weight bits | Gops on Nexus 5 | Gops on RPi 3 |
 57 | | --- | --- | --- | --- |
 58 | | Float | 32 | 0.617 | 1.14 |
 59 | | Int8 | 8 | 9.99 | 4.06 |
 60 | | Float-binary | 1 | 3.18 | 2.39 |
 61 | | Int8-binary | 1 | 12.6 | 4.21 |
 62 | | Binary | 1 | 62.4 | 38.8 |
 63 | 
 64 | * Single thread performance
 65 | * Nexus 5: 2.3 GHz Krait 400
 66 | * RPi 3: 1.2 GHz Cortex-A53
 67 | 
 68 | ## Examples
 69 | 
 70 | ### BinaryNet CIFAR10
 71 | 
 72 | This example reimplements the CIFAR10 model from the BinaryNet paper. It also contains a basic example (`test_cifar10.c') to verify the exported weights and code. It has fully binary weights and activations and achieves an accuracy of around 88.6%.
 73 | 
 74 | [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz)
 75 | 
 76 | ### XNOR-net
 77 | 
 78 | This example reimplements the BWN and XNORNet networks from the [XNOR-Net](https://github.com/allenai/XNOR-Net). The BWN network is AlexNet with binarized weights for all but the first and last layers. The XNORNet network is similar but additionally uses binary activations so that binary convolutions can be used for the middle layers.
 79 | 
 80 | This example does not do training, instead it uses the pre-trained weights available here:
 81 | 
 82 | [BWN](https://s3-us-west-2.amazonaws.com/ai2-vision/xnornet/alexnet_BWN.t7)
 83 | 
 84 | [XNOR](https://s3-us-west-2.amazonaws.com/ai2-vision/xnornet/alexnet_XNOR.t7)
 85 | 
 86 | [cache](https://s3-us-west-2.amazonaws.com/ai2-vision/xnornet/cache.tar)
 87 | 
 88 | * these are torch files so pytorch is required. CUDA is also needed because the files contain CUDA objects, but the pytorch deserializer (read_lua_file.py) can be modified to treat them as non-CUDA objects to run on a machine without CUDA.
 89 | 
 90 | [Demo on Android](android-demo/) : Android demo featuring these two networks applied on camera input (and other features)
 91 | 
 92 | [Demo on Raspberry Pi 3](rpi-demo/) : Simple linux demo using a webcam
 93 | 
 94 | #### Analysis
 95 | 
 96 | | Network | Weight Size (MiB) | Run time on Nexus 5 (ms) | Run time on RPi 3 (ms) |
 97 | | --- | --- | --- | --- |
 98 | | XNORNet | 22.7 | 102 | 216 |
 99 | | BWN | 22.8 | 623 | 970 |
100 | | XNORNet (quantized) | 10.9 | 60 | 123 |
101 | | BWN (quantized) | 11.0 | 176 | 546 |
102 | 
103 | * Measured with `test_xnornet` program
104 | * Single thread run times
105 | * Times for Nexus 5 are best of multiple runs
106 | * For XNORNet the quantization only affects the first and last layers
107 | * TODO pi 64
108 | 
109 | 
110 | ## References
111 | 
112 | * [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/abs/1603.05279)
113 | * [BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1.](http://arxiv.org/abs/1602.02830)
114 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-slate


--------------------------------------------------------------------------------
/android-demo/AndroidManifest.xml:
--------------------------------------------------------------------------------
 1 | <manifest xmlns:android="http://schemas.android.com/apk/res/android"
 2 |         package="test.app"
 3 |         android:versionName="0.15.0"
 4 |         android:versionCode="3">
 5 | 
 6 |     <uses-sdk android:minSdkVersion="12" />
 7 | 
 8 |     <uses-feature android:glEsVersion="0x00020000"/>
 9 |     <uses-permission android:name="android.permission.CAMERA" />
10 |     <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
11 |     <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
12 | 
13 |     <application android:label="test_app"
14 |             android:icon="@drawable/icon">
15 |         <activity android:name=".MainActivity"
16 |                 android:label="test_app"
17 |                 android:screenOrientation="portrait"
18 |                 android:theme="@android:style/Theme.DeviceDefault.NoActionBar">
19 |                 <!-- NoTitleBar.Fullscreen -->
20 |             <intent-filter>
21 |                 <action android:name="android.intent.action.MAIN" />
22 |                 <category android:name="android.intent.category.LAUNCHER" />
23 |             </intent-filter>
24 |         </activity>
25 |     </application>
26 | 
27 | </manifest>
28 | 


--------------------------------------------------------------------------------
/android-demo/Camera.java:
--------------------------------------------------------------------------------
  1 | package test.app;
  2 | 
  3 | import android.util.Log;
  4 | import android.view.Surface;
  5 | import android.graphics.SurfaceTexture;
  6 | import android.hardware.camera2.*;
  7 | import java.util.Arrays;
  8 | 
  9 | class Camera {
 10 |     static String TAG = "TESTAPP_Camera";
 11 | 
 12 |     int num_opened = 0;
 13 |     CameraDevice camera = null;
 14 | 
 15 |     Surface surface;
 16 |     SurfaceTexture texture;
 17 | 
 18 |     CaptureRequest build_request() throws CameraAccessException {
 19 |         CaptureRequest.Builder req;
 20 | 
 21 |         req = camera.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW);
 22 |         req.addTarget(surface);
 23 |         req.set(CaptureRequest.CONTROL_AF_MODE,
 24 |                 CaptureRequest.CONTROL_AF_MODE_CONTINUOUS_PICTURE);
 25 |         return req.build();
 26 |     }
 27 | 
 28 |     final CameraCaptureSession.CaptureCallback capture_callback =
 29 |         new CameraCaptureSession.CaptureCallback() {
 30 |         @Override
 31 |         public void onCaptureProgressed(CameraCaptureSession session,
 32 |                                         CaptureRequest request,
 33 |                                         CaptureResult result) {
 34 |         }
 35 | 
 36 |         @Override
 37 |         public void onCaptureCompleted(CameraCaptureSession session,
 38 |                                        CaptureRequest request,
 39 |                                        TotalCaptureResult result) {
 40 |             float mtx[] = new float[16];
 41 | 
 42 |             texture.updateTexImage();
 43 |             texture.getTransformMatrix(mtx);
 44 |             MainActivity.encode(mtx);
 45 |         }
 46 |     };
 47 | 
 48 |     final CameraCaptureSession.StateCallback capture_state_callback =
 49 |         new CameraCaptureSession.StateCallback() {
 50 |         @Override
 51 |         public void onConfigured(CameraCaptureSession session) {
 52 |             if (camera == null)
 53 |                 return;
 54 | 
 55 |             try {
 56 |                 session.setRepeatingRequest(build_request(), capture_callback, null);
 57 |             } catch (CameraAccessException e) {
 58 |                 Log.e(TAG, "", e);
 59 |             }
 60 |         }
 61 | 
 62 |         @Override
 63 |         public void onConfigureFailed(CameraCaptureSession session) {
 64 |         }
 65 |     };
 66 | 
 67 |     final CameraDevice.StateCallback state_callback = new CameraDevice.StateCallback() {
 68 |         @Override
 69 |         public void onOpened(CameraDevice cam) {
 70 |             /* if multiple cameras opened, only keep the last one
 71 |                 TODO: does this method always work?
 72 |             */
 73 |             if (num_opened > 1) {
 74 |                 num_opened--;
 75 |                 cam.close();
 76 |                 return;
 77 |             }
 78 | 
 79 |             try {
 80 |                 if (camera != null)
 81 |                     throw new RuntimeException("camera is not null");
 82 |                 camera = cam;
 83 |                 camera.createCaptureSession(Arrays.asList(surface),
 84 |                                             capture_state_callback, null);
 85 |             } catch (CameraAccessException e) {
 86 |                 Log.e(TAG, "", e);
 87 |             }
 88 |         }
 89 | 
 90 |         /* is cam.close() necessary? */
 91 |         @Override
 92 |         public void onDisconnected(CameraDevice cam) {
 93 |             num_opened--;
 94 |             if (camera == cam)
 95 |                 camera = null;
 96 |             cam.close();
 97 |         }
 98 | 
 99 |         @Override
100 |         public void onError(CameraDevice cam, int error) {
101 |             onDisconnected(cam);
102 |         }
103 |     };
104 | 
105 |     Camera(Surface surface, SurfaceTexture texture) {
106 |         this.surface = surface;
107 |         this.texture = texture;
108 |     }
109 | 
110 |     void open(CameraManager manager, String name) {
111 |         close();
112 |         try {
113 |             num_opened++;
114 |             manager.openCamera(name, state_callback, null);
115 |         } catch (CameraAccessException e) {
116 |             Log.e(TAG, "", e);
117 |             return;
118 |         }
119 |     }
120 | 
121 |     void close() {
122 |         if (camera != null) {
123 |             num_opened--;
124 |             camera.close();
125 |             camera = null;
126 |         }
127 |     }
128 | }
129 | 


--------------------------------------------------------------------------------
/android-demo/Decoder.java:
--------------------------------------------------------------------------------
  1 | package test.app;
  2 | 
  3 | import android.util.Log;
  4 | import android.view.Surface;
  5 | import android.net.Uri;
  6 | import android.content.Context;
  7 | import android.media.*;
  8 | import java.util.LinkedList;
  9 | import java.io.IOException;
 10 | 
 11 | class Decoder {
 12 |     static String TAG = "TESTAPP_Decoder";
 13 | 
 14 |     MediaCodec codec;
 15 |     MediaExtractor extract;
 16 | 
 17 |     Surface surface;
 18 | 
 19 |     LinkedList<FrameInfo> fqueue = new LinkedList<FrameInfo>();
 20 |     long prev_time, prev_pts;
 21 |     boolean first_frame;
 22 | 
 23 |     static class FrameInfo {
 24 |         int index;
 25 |         MediaCodec.BufferInfo info;
 26 | 
 27 |         FrameInfo(int index, MediaCodec.BufferInfo info) {
 28 |             this.index = index;
 29 |             this.info = info;
 30 |         }
 31 |     }
 32 | 
 33 |     boolean process() {
 34 |         long curr_time = System.nanoTime() / 1000;
 35 | 
 36 |         while (fqueue.peek() != null) {
 37 |             FrameInfo fi = fqueue.peek();
 38 |             long delta = fi.info.presentationTimeUs - prev_pts;
 39 |             if (delta < 0)
 40 |                 delta = 0;
 41 |             if (delta > 1000000) // max 1 second
 42 |                 delta = 1000000;
 43 | 
 44 |             if (curr_time - prev_time < delta && !first_frame)
 45 |                 break;
 46 | 
 47 |             fqueue.remove();
 48 |             prev_time = curr_time;
 49 |             prev_pts = fi.info.presentationTimeUs;
 50 |             first_frame = false;
 51 | 
 52 |             codec.releaseOutputBuffer(fi.index, true);
 53 |             if ((fi.info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0)
 54 |                 return true;
 55 |         }
 56 |         return false;
 57 |     }
 58 | 
 59 |     final MediaCodec.Callback callback = new MediaCodec.Callback() {
 60 |         public void onError(MediaCodec codec, MediaCodec.CodecException e) {
 61 |         }
 62 | 
 63 |         public void onInputBufferAvailable(MediaCodec codec, int index) {
 64 |             if (extract == null)
 65 |                 return;
 66 | 
 67 |             int size = extract.readSampleData(codec.getInputBuffer(index), 0);
 68 |             long timestamp = extract.getSampleTime();
 69 |             if (extract.advance() && size > 0) {
 70 |                 codec.queueInputBuffer(index, 0, size, timestamp, 0);
 71 |             } else {
 72 |                 codec.queueInputBuffer(index, 0, 0, 0,
 73 |                                        MediaCodec.BUFFER_FLAG_END_OF_STREAM);
 74 |                 extract.release();
 75 |                 extract = null;
 76 |             }
 77 |         }
 78 | 
 79 |         public void onOutputBufferAvailable(MediaCodec codec, int index,
 80 |                                             MediaCodec.BufferInfo info) {
 81 |             fqueue.offer(new FrameInfo(index, info));
 82 |         }
 83 | 
 84 |         public void onOutputFormatChanged(MediaCodec codec, MediaFormat format) {
 85 |         }
 86 |     };
 87 | 
 88 |     Decoder(Surface surface) {
 89 |         this.surface = surface;
 90 |     }
 91 | 
 92 |     boolean open(Context context, Uri uri) {
 93 |         extract = new MediaExtractor();
 94 | 
 95 |         try {
 96 |             extract.setDataSource(context, uri, null);
 97 |         } catch (IOException e) {
 98 |             Log.e(TAG, "", e);
 99 |             return false;
100 |         }
101 | 
102 |         for (int track = 0; track < extract.getTrackCount(); track++) {
103 |             MediaFormat format = extract.getTrackFormat(track);
104 |             String mime = format.getString(MediaFormat.KEY_MIME);
105 |             if (mime.startsWith("video/")) {
106 |                 extract.selectTrack(track);
107 |                 try {
108 |                     codec = MediaCodec.createDecoderByType(mime);
109 |                 } catch (IOException e) {
110 |                     Log.e(TAG, "", e);
111 |                     continue;
112 |                 }
113 |                 codec.configure(format, surface, null, 0);
114 |                 codec.setCallback(callback, null);
115 |                 codec.start();
116 |                 first_frame = true;
117 |                 return true;
118 |             }
119 |         }
120 |         //failure
121 |         extract.release();
122 |         extract = null;
123 |         return false;
124 |     }
125 | 
126 |     void close() {
127 |         if (extract != null) {
128 |             extract.release();
129 |             extract = null;
130 |         }
131 | 
132 |         if (codec == null)
133 |             return;
134 | 
135 |         codec.stop();
136 |         codec.release();
137 |         codec = null;
138 | 
139 |         while (fqueue.peek() != null)
140 |             fqueue.remove();
141 |     }
142 | }
143 | 


--------------------------------------------------------------------------------
/android-demo/Encoder.java:
--------------------------------------------------------------------------------
 1 | package test.app;
 2 | 
 3 | import android.util.Log;
 4 | import android.view.Surface;
 5 | import android.media.*;
 6 | import java.nio.ByteBuffer;
 7 | import java.io.IOException;
 8 | 
 9 | class Encoder {
10 |     static String TAG = "TESTAPP_Encoder";
11 | 
12 |     MediaCodec codec;
13 |     Surface surface;
14 | 
15 |     static MediaFormat format(int width, int height) {
16 |         MediaFormat format = MediaFormat.createVideoFormat("video/avc", width, height);
17 |         format.setInteger(MediaFormat.KEY_BIT_RATE, 8000000);
18 | 		format.setInteger(MediaFormat.KEY_FRAME_RATE, 15);
19 | 		format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 0);
20 | 		format.setInteger(MediaFormat.KEY_COLOR_FORMAT,
21 |                           MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
22 | 		format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 2);
23 | 		return format;
24 |     }
25 | 
26 |     final MediaCodec.Callback callback = new MediaCodec.Callback() {
27 |         public void onError(MediaCodec codec, MediaCodec.CodecException e) {
28 |         }
29 | 
30 |         public void onInputBufferAvailable(MediaCodec codec, int index) {
31 |         }
32 | 
33 |         public void onOutputBufferAvailable(MediaCodec codec, int index,
34 |                                             MediaCodec.BufferInfo info) {
35 |             ByteBuffer data = codec.getOutputBuffer(index);
36 |             MainActivity.addbuffer(data, info.offset, info.size,
37 |                                    info.presentationTimeUs, info.flags);
38 |             codec.releaseOutputBuffer(index, false);
39 |         }
40 | 
41 |         public void onOutputFormatChanged(MediaCodec codec, MediaFormat format) {
42 |         }
43 |     };
44 | 
45 |     Encoder() {
46 |     }
47 | 
48 |     void open(int width, int height) {
49 |         close();
50 | 
51 |         try {
52 |             codec = MediaCodec.createEncoderByType("video/avc");
53 | 		} catch (IOException e) {
54 |             Log.e(TAG, "", e);
55 | 		}
56 | 
57 | 		codec.configure(format(width, height), null, null,
58 |                         MediaCodec.CONFIGURE_FLAG_ENCODE);
59 |         surface = codec.createInputSurface();
60 |         codec.setCallback(callback, null);
61 |         codec.start();
62 |     }
63 | 
64 |     void close() {
65 |         if (codec != null) {
66 |             codec.stop();
67 |             surface.release();
68 |             codec = null;
69 |         }
70 |     }
71 | }
72 | 


--------------------------------------------------------------------------------
/android-demo/MainActivity.java:
--------------------------------------------------------------------------------
  1 | /*
  2 | Java part of the app, manages configuration, camera, video codecs, etc.
  3 | 
  4 | two exclusive inputs: Camera and Decoder, both write to the same SurfaceTexture
  5 | 
  6 | SurfaceTexture is used to render to a SurfaceView using opengl es2.0 (native code)
  7 | 
  8 | rendering is done as often as possible (Vsync) while the SurfaceView is active:
  9 |     *when input is camera, the latest frame is displayed
 10 |     *when decoder is input, there is some logic to time frames correctly
 11 | 
 12 | when camera is input the input is also rendered to an encoder surface for recording
 13 |     *this is done in response to onCaptureCompleted events
 14 | 
 15 | the overlay text (class+probabilities) is updated every 100ms (CountDownTimer) to a
 16 | string obtained from the native code
 17 | 
 18 | encoded data is passed on to native code where it is stored in a circular buffer
 19 | 
 20 | when the "snapshot" button is pressed, a filename /sdcard/clipXXXXX.mp4 is generated,
 21 | where XXXXX is an incremental value stored using preferences, and passed on to native
 22 | code which creates the mp4 file
 23 | 
 24 | TODO:
 25 | -detect cameras and resolutions instead of hardcoded options
 26 | -way to stop clip (currently possible by pausing and resuming activity)
 27 | -add some user control over codec parameters
 28 | -split preferences stuff into seperate class
 29 | -some cleaning up
 30 | */
 31 | 
 32 | package test.app;
 33 | 
 34 | import android.view.*;
 35 | import android.widget.*;
 36 | import android.util.*;
 37 | import android.graphics.*;
 38 | import android.hardware.camera2.*;
 39 | import android.os.*;
 40 | import android.content.*;
 41 | import android.media.*;
 42 | import android.app.*;
 43 | import android.preference.*;
 44 | import android.database.Cursor;
 45 | import android.net.Uri;
 46 | import android.provider.MediaStore;
 47 | import javax.microedition.khronos.opengles.GL10;
 48 | import javax.microedition.khronos.egl.*;
 49 | import java.io.*;
 50 | import java.nio.ByteBuffer;
 51 | import java.util.HashMap;
 52 | 
 53 | public class MainActivity extends Activity implements SurfaceHolder.Callback2,
 54 |     SharedPreferences.OnSharedPreferenceChangeListener {
 55 |     static {
 56 |         System.loadLibrary("hello-jni");
 57 |     }
 58 | 
 59 |     static native int init(Surface surface);
 60 |     static native int exit();
 61 | 
 62 |     static native void draw(float[] mtx);
 63 |     static native void encode(float[] mtx);
 64 | 
 65 |     static native void setcodecsurface(Surface surface);
 66 |     static native void created(Surface surface);
 67 |     static native void changed(Surface surface, int fmt, int width, int height);
 68 |     static native void destroyed();
 69 | 
 70 |     static native void setnetwork(int id);
 71 | 
 72 |     static native void addbuffer(ByteBuffer buffer, int offset, int size,
 73 |                                  long timestamp, int flags);
 74 |     static native void writemux(String path);
 75 | 
 76 |     static native void drag(float x0, float y0, float x1, float y1);
 77 | 
 78 |     static native String getoverlay();
 79 | 
 80 |     static String TAG = "MainActivityTESTAPP";
 81 |     static String VERSION = "0";
 82 | 
 83 |     Camera camera;
 84 |     Decoder decoder;
 85 |     Encoder encoder;
 86 | 
 87 |     Surface surface;
 88 |     SurfaceTexture surface_texture;
 89 | 
 90 |     SurfaceView view;
 91 |     boolean have_surface = false, draw_called = false;
 92 | 
 93 |     TextView overlay;
 94 |     ImageButton button;
 95 | 
 96 |     SharedPreferences prefs;
 97 | 
 98 |     String camera_id;
 99 |     int camera_width, camera_height;
100 | 
101 |     static void printf(String format, Object... arguments) {
102 |         Log.e(TAG, String.format(format, arguments));
103 |     }
104 | 
105 |     public void surfaceCreated(SurfaceHolder holder) {
106 |         have_surface = true;
107 |         created(holder.getSurface());
108 | 
109 |         // TODO
110 |         if (decoder.codec == null)
111 |             openCamera();
112 | 
113 |         if (!draw_called) {
114 |             draw_called = true;
115 |             (new Handler()).post(draw_call);
116 |         }
117 |     }
118 | 
119 |     public void surfaceChanged(SurfaceHolder holder, int fmt, int width, int height) {
120 |         changed(holder.getSurface(), fmt, width, height);
121 |     }
122 | 
123 |     public void surfaceRedrawNeeded(SurfaceHolder holder) {
124 |     }
125 | 
126 |     public void surfaceDestroyed(SurfaceHolder holder) {
127 |         have_surface = false;
128 |         destroyed();
129 |         camera.close();
130 |         decoder.close();
131 |     }
132 | 
133 |     final Runnable draw_call = new Runnable() {
134 |         public void run() {
135 |             if (!have_surface) {
136 |                 draw_called = false;
137 |                 return;
138 |             }
139 | 
140 |             if (decoder.codec != null) {
141 |                 if (decoder.process()) { // TODO
142 |                     decoder.close();
143 |                     openCamera();
144 |                 }
145 |                 surface_texture.updateTexImage();
146 |             }
147 | 
148 |             float mtx[] = new float[16];
149 |             surface_texture.getTransformMatrix(mtx);
150 |             draw(mtx);
151 | 
152 |             (new Handler()).post(draw_call);
153 |         }
154 |     };
155 | 
156 |     static String[] keys = {
157 |         "network_id", "camera_id", "camera_res", "area_select"
158 |     };
159 |     static CharSequence[] titles = {
160 |         "Network Type", "Camera", "Camera Resolution", "Area Select"
161 |     };
162 |     static CharSequence[] networks = {"BWN", "XNORNET"};
163 |     static CharSequence[] cameras = {"0", "1"};
164 |     static CharSequence[] resolutions = {"640x480", "1920x1080"};
165 | 
166 |     Preference pref[] = new Preference[4];
167 | 
168 | 
169 |     final PreferenceFragment settings = new PreferenceFragment() {
170 |         @Override
171 |         public void onCreate(Bundle savedInstanceState) {
172 |             super.onCreate(savedInstanceState);
173 |             Context context = getActivity();
174 |             PreferenceScreen screen =
175 |                 getPreferenceManager().createPreferenceScreen(context);
176 | 
177 |             ListPreference lp;
178 |             SwitchPreference sp;
179 | 
180 |             pref[0] = new ListPreference(context);
181 |             pref[1] = new ListPreference(context);
182 |             pref[2] = new ListPreference(context);
183 |             pref[3] = new SwitchPreference(context);
184 | 
185 |             lp = (ListPreference) pref[0];
186 |             lp.setEntries(networks);
187 |             lp.setEntryValues(networks);
188 | 
189 |             lp = (ListPreference) pref[1];
190 |             lp.setEntries(cameras);
191 |             lp.setEntryValues(cameras);
192 | 
193 |             lp = (ListPreference) pref[2];
194 |             lp.setEntries(resolutions);
195 |             lp.setEntryValues(resolutions);
196 | 
197 | 
198 |             for (int i = 0; i < pref.length; i++) {
199 |                 updateSummary(prefs, keys[i], i);
200 |                 pref[i].setKey(keys[i]);
201 |                 pref[i].setTitle(titles[i]);
202 |                 screen.addPreference(pref[i]);
203 |             }
204 | 
205 |             Preference pr;
206 | 
207 |             pr = new Preference(context);
208 |             pr.setTitle("Open Clip");
209 |             pr.setOnPreferenceClickListener(new Preference.OnPreferenceClickListener() {
210 |                 @Override
211 |                 public boolean onPreferenceClick(Preference preference) {
212 |                     Intent intent = new Intent(Intent.ACTION_OPEN_DOCUMENT);
213 |                     intent.addCategory(Intent.CATEGORY_OPENABLE);
214 |                     intent.setType("*/*");
215 |                     MainActivity.this.startActivityForResult(intent, 0);
216 |                     return true;
217 |                 }
218 |             });
219 |             screen.addPreference(pr);
220 | 
221 |             setPreferenceScreen(screen);
222 |         }
223 |     };
224 | 
225 |     void updateSummary(SharedPreferences prefs, String key, int i) {
226 |         String summary;
227 |         if (i < 3)
228 |             summary = prefs.getString(key, "");
229 |         else {
230 |             printf("KEY %s %d\n", key, i);
231 |             summary = prefs.getBoolean(key, false) ? "On" : "Off";
232 |         }
233 |         pref[i].setSummary(summary);
234 |     }
235 | 
236 |     void applySetting(SharedPreferences prefs, String key, int i) {
237 |         if (i < 3) {
238 |             String string = prefs.getString(key, "0");
239 |             if (i == 0) {
240 |                 setnetwork(string == "BWN" ? 0 : 1);
241 |             } else if (i == 1) {
242 |                 camera_id = string;
243 |                 // camera_width / camera_height reset
244 |                 openCamera();
245 |             } else if (i == 2) {
246 |                 String[] str = string.split("x");
247 |                 camera_width = Integer.parseInt(str[0]);
248 |                 camera_height = Integer.parseInt(str[1]);
249 |                 openCamera();
250 |             }
251 |         } else {
252 |             boolean bool = prefs.getBoolean(key, false);
253 |             //use_area = bool;
254 |         }
255 |     }
256 | 
257 |     public void onSharedPreferenceChanged(SharedPreferences prefs, String key) {
258 |         int i;
259 |         for (i = 0; i < pref.length; i++)
260 |             if (key == keys[i])
261 |                 break;
262 |         if (i == pref.length)
263 |             return;
264 | 
265 |         updateSummary(prefs, key, i);
266 |         applySetting(prefs, key, i);
267 |     }
268 | 
269 |     void openCamera() {
270 |         if (decoder.codec != null || !have_surface)
271 |             return;
272 | 
273 |         camera.close();
274 | 
275 |         encoder.open(camera_width, camera_height);
276 |         setcodecsurface(encoder.surface);
277 | 
278 |         /* change capture resolution */
279 |         surface_texture.setDefaultBufferSize(camera_width, camera_height);
280 |         camera.open((CameraManager)getSystemService(Context.CAMERA_SERVICE), camera_id);
281 |     }
282 | 
283 |     final View.OnClickListener record_click = new View.OnClickListener() {
284 |         @Override
285 |         public void onClick(View v) {
286 |             if (decoder.codec != null) {
287 |                 Toast.makeText(MainActivity.this, "stopped clip", 0).show();
288 |                 return;
289 |             }
290 | 
291 |             String path;
292 |             int id;
293 | 
294 |             id = prefs.getInt("clip_id", 0);
295 |             path = String.format("/sdcard/clip%05d.mp4", id);
296 | 
297 |             SharedPreferences.Editor edit = prefs.edit();
298 |             edit.putInt("clip_id", id + 1);
299 |             edit.commit();
300 | 
301 |             writemux(path);
302 |             Toast.makeText(MainActivity.this, "saved as " + path, 0).show();
303 |         }
304 |     };
305 | 
306 |     HashMap<Integer,MotionEvent.PointerCoords> prev =
307 |         new HashMap<Integer,MotionEvent.PointerCoords>();
308 | 
309 |     final View.OnTouchListener touch_listener = new View.OnTouchListener() {
310 |         @Override
311 |         public boolean onTouch(View v, MotionEvent event) {
312 |             int i = event.getActionIndex();
313 |             int id = event.getPointerId(i);
314 |             MotionEvent.PointerCoords coord;
315 |             switch (event.getActionMasked()) {
316 |             case MotionEvent.ACTION_DOWN:
317 |             case MotionEvent.ACTION_POINTER_DOWN:
318 |                 coord = new  MotionEvent.PointerCoords();
319 |                 event.getPointerCoords(i, coord);
320 |                 prev.put(id, coord);
321 |                 break;
322 |             case MotionEvent.ACTION_UP:
323 |             case MotionEvent.ACTION_POINTER_UP:
324 |                 prev.remove(id);
325 |                 break;
326 |             case MotionEvent.ACTION_MOVE:
327 |                 for (i = 0; i < event.getPointerCount(); i++) {
328 |                     id = event.getPointerId(i);
329 | 
330 |                     coord = prev.get(id);
331 |                     float x = coord.x, y = coord.y;
332 |                     event.getPointerCoords(i, coord);
333 | 
334 |                     //printf("%d %f %f %f %f", id, x, y, coord.x, coord.y);
335 |                     drag(x, y, coord.x, coord.y);
336 |                 }
337 |                 break;
338 |             }
339 |             return true;
340 |         }
341 |     };
342 | 
343 |     @Override
344 |     protected void onCreate(Bundle savedInstanceState) {
345 |         super.onCreate(savedInstanceState);
346 | 
347 |         prefs = PreferenceManager.getDefaultSharedPreferences(this);
348 |         if (prefs.getString("version", "NONE") == "NONE") {
349 |             SharedPreferences.Editor edit = prefs.edit();
350 |             edit.putString("version", VERSION);
351 |             edit.putString(keys[0], "BWN");
352 |             edit.putString(keys[1], "0");
353 |             edit.putString(keys[2], "640x480");
354 |             edit.putInt("clip_id", 0);
355 |             edit.commit();
356 |         }
357 |         prefs.registerOnSharedPreferenceChangeListener(this);
358 | 
359 | 
360 |         overlay = new TextView(this);
361 |         overlay.setTextColor(Color.WHITE);
362 |         overlay.setShadowLayer(2.0f, 0.0f, 0.0f, Color.BLACK);
363 |         overlay.setTextSize(TypedValue.COMPLEX_UNIT_SP, 16);
364 |         overlay.setText("test text");
365 |         FrameLayout.LayoutParams params = new FrameLayout.LayoutParams(RelativeLayout.LayoutParams.WRAP_CONTENT, RelativeLayout.LayoutParams.WRAP_CONTENT);
366 |         params.gravity = Gravity.TOP | Gravity.LEFT;
367 | 
368 |         view = new SurfaceView(this);
369 |         view.getHolder().addCallback(this);
370 |         view.setOnTouchListener(touch_listener);
371 | 
372 |         setContentView(view);
373 |         addContentView(overlay, params);
374 | 
375 |         button = new ImageButton(this);
376 |         button.setOnClickListener(record_click);
377 |         button.setImageResource(android.R.drawable.btn_radio);
378 | 
379 |         FrameLayout.LayoutParams params2 = new FrameLayout.LayoutParams(RelativeLayout.LayoutParams.WRAP_CONTENT, RelativeLayout.LayoutParams.WRAP_CONTENT);
380 |         params2.gravity = Gravity.BOTTOM | Gravity.CENTER;
381 |         addContentView(button, params2);
382 | 
383 |         /* the init function needs a surface for egl, get one from encoder
384 |            *there is a probably a better way to get a temporary surface
385 |         */
386 |         encoder = new Encoder();
387 | 
388 |         encoder.open(640, 480);
389 |         int texture_id = init(encoder.surface);
390 |         encoder.close();
391 | 
392 |         surface_texture = new SurfaceTexture(texture_id);
393 |         surface = new Surface(surface_texture);
394 | 
395 |         camera = new Camera(surface, surface_texture);
396 |         decoder = new Decoder(surface);
397 | 
398 |         for (int i = 0; i < pref.length; i++)
399 |             applySetting(prefs, keys[i], i);
400 | 
401 |         final CountDownTimer timer = new CountDownTimer(30000000000l, 100) {
402 |             public void onTick(long millisUntilFinished) {
403 |                 overlay.setText(getoverlay());
404 |             }
405 | 
406 |             public void onFinish() {
407 |                 //mTextField.setText("done!");
408 |             }
409 |         };
410 |         timer.start();
411 |     }
412 | 
413 |     @Override
414 |     protected void onActivityResult(int requestCode, int resultCode, Intent data) {
415 |         // can only be result for clip open
416 | 
417 |         if (resultCode != -1) // -1 = success
418 |             return;
419 | 
420 |         if (decoder.open(this, data.getData()))
421 |             camera.close();
422 |     }
423 | 
424 |     /* use the menu button to toggle preferences
425 |         this assumes there will be a menu button... */
426 |     boolean toggle = false;
427 | 
428 |     @Override
429 |     public boolean onPrepareOptionsMenu(Menu menu) {
430 |         FragmentManager fm = getFragmentManager();
431 |         if (!toggle)
432 |             fm.beginTransaction().replace(android.R.id.content, settings).commit();
433 |         else
434 |             fm.beginTransaction().remove(settings).commit();
435 | 
436 |         overlay.setVisibility(toggle ? View.VISIBLE : View.INVISIBLE);
437 |         button.setVisibility(toggle ? View.VISIBLE : View.INVISIBLE);
438 | 
439 |         toggle = !toggle;
440 |         return false;
441 |     }
442 | }
443 | 


--------------------------------------------------------------------------------
/android-demo/README.md:
--------------------------------------------------------------------------------
 1 | ImageNet example using camera on Android. Tested on Nexus 5.
 2 | 
 3 | Install required tools (debian sid):
 4 | ```
 5 | # clang
 6 | apt install clang-5.0 lld-5.0
 7 | # java
 8 | apt install openjdk-8-jdk
 9 | # android tools
10 | apt install dalvik-exchange zipalign aapt libandroid-23-java libandroid-tools-sdklib-java
11 | ```
12 | 
13 | You will also need NDK headers and libraries.
14 | 
15 | 
16 | You can either generate your own weights and code using the scripts or use these:
17 | 
18 | [XNORNET_BWN_OUTPUT](https://github.com/jonathanmarek1/binarynet-tensorflow/releases/download/test/XNORNET_BWN_OUTPUT.zip)
19 | 
20 | 
21 | Build the apk:
22 | ```
23 | ARCH="armeabi-v7a" CFLAGS="-target armv7a-none-linux-android -mcpu=krait -mfpu=neon-vfpv4 -DNUM_THREAD=2" NDK="/media/test/app/android-ndk-r15b" sh make
24 | ```
25 | 


--------------------------------------------------------------------------------
/android-demo/main.c:
--------------------------------------------------------------------------------
  1 | #include <stdint.h>
  2 | #include <string.h>
  3 | 
  4 | #include <jni.h>
  5 | #include <android/log.h>
  6 | #include <android/native_window_jni.h>
  7 | 
  8 | #include <GLES2/gl2.h>
  9 | #include <GLES2/gl2ext.h>
 10 | #include <EGL/egl.h>
 11 | #define EGL_EGLEXT_PROTOTYPES
 12 | #include <EGL/eglext.h>
 13 | 
 14 | #include <stdio.h>
 15 | #include <stdlib.h>
 16 | #include <stdbool.h>
 17 | #include <math.h>
 18 | #include <time.h>
 19 | #include <assert.h>
 20 | #include <sys/stat.h>
 21 | #include <sys/mman.h>
 22 | #include <unistd.h>
 23 | #include <fcntl.h>
 24 | 
 25 | #include "shaders.h"
 26 | 
 27 | #define printf(args...) __android_log_print(ANDROID_LOG_ERROR, "test_app", args)
 28 | 
 29 | #define jni(ret, name, args...) \
 30 |     JNIEXPORT ret JNICALL Java_test_app_MainActivity_ ## name(JNIEnv *env, jobject this, args)
 31 | 
 32 | #define jni0(ret, name) \
 33 |     JNIEXPORT ret JNICALL Java_test_app_MainActivity_ ## name(JNIEnv *env, jobject this)
 34 | 
 35 | struct egl {
 36 |     EGLDisplay display;
 37 |     EGLContext context;
 38 |     EGLConfig  config;
 39 | };
 40 | 
 41 | struct gl {
 42 |     GLuint program, program_oes;
 43 |     GLuint pos, texture, texture_matrix, color;
 44 |     GLuint pos_oes, texture_oes, texture_matrix_oes;
 45 |     GLuint camera_texture, fbo_texture;
 46 |     GLuint fbo;
 47 | };
 48 | 
 49 | uint64_t t0;
 50 | int paused;
 51 | 
 52 | #include <pthread.h>
 53 | 
 54 | uint8_t buf[227*227*4];
 55 | char out_string[1024] = "nothing.. yet";
 56 | pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 57 | pthread_mutex_t mutex_out = PTHREAD_MUTEX_INITIALIZER;
 58 | pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
 59 | 
 60 | struct {
 61 |     float x0, y0, x1, y1;
 62 | } box;
 63 | 
 64 | static uint64_t get_time(void)
 65 | {
 66 |     struct timespec ts;
 67 | 
 68 |     clock_gettime(CLOCK_MONOTONIC, &ts);
 69 |     return (uint64_t) ts.tv_sec * 1000000000ull + ts.tv_nsec;
 70 | }
 71 | 
 72 | void egl_init(struct egl *egl)
 73 | {
 74 |     const EGLint attrib_list[] = {EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE};
 75 |     const EGLint attribs[] = { EGL_SURFACE_TYPE, EGL_WINDOW_BIT,
 76 |                                EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT,
 77 |                                EGL_BLUE_SIZE, 8, EGL_GREEN_SIZE, 8,
 78 |                                EGL_RED_SIZE, 8, EGL_ALPHA_SIZE, 8,
 79 |                                EGL_DEPTH_SIZE, 0, EGL_NONE };
 80 | 
 81 |     EGLDisplay display;
 82 |     EGLContext context;
 83 |     EGLConfig  config;
 84 |     EGLint count;
 85 | 
 86 |     display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
 87 |     assert(display);
 88 | 
 89 |     if (!eglInitialize(display, 0, 0))
 90 |         assert(0);
 91 | 
 92 |     if (!eglChooseConfig(display, attribs, &config, 1, &count) || !count)
 93 |         assert(0);
 94 | 
 95 |     context = eglCreateContext(display, config, 0, attrib_list);
 96 |     assert(context);
 97 | 
 98 |     *egl = (struct egl) {display, context, config};
 99 | }
100 | 
101 | void gl_init(struct egl *egl, struct gl *gl, EGLSurface surface)
102 | {
103 |     if (!eglMakeCurrent(egl->display, surface, surface, egl->context))
104 |         assert(0);
105 | 
106 |     gl->program = compile_program(vertex_shader, fragment_shader);
107 |     gl->program_oes = compile_program(vertex_shader, fragment_shader_oes);
108 |     assert(gl->program && gl->program_oes);
109 | 
110 |     gl->pos = glGetAttribLocation(gl->program, "pos");
111 |     gl->texture = glGetUniformLocation(gl->program, "tex");
112 |     gl->texture_matrix = glGetUniformLocation(gl->program, "texmat");
113 |     gl->color = glGetUniformLocation(gl->program, "color");
114 | 
115 |     gl->pos_oes = glGetAttribLocation(gl->program_oes, "pos");
116 |     gl->texture_oes = glGetUniformLocation(gl->program_oes, "tex");
117 |     gl->texture_matrix_oes = glGetUniformLocation(gl->program_oes, "texmat");
118 | 
119 |     glGenTextures(1, &gl->camera_texture);
120 |     glGenFramebuffers(1, &gl->fbo);
121 |     glGenTextures(1, &gl->fbo_texture);
122 | 
123 |     glBindTexture(GL_TEXTURE_2D, gl->fbo_texture);
124 |     glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 227, 227, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0);
125 |     glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
126 |     glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
127 | 
128 |     glBindFramebuffer(GL_FRAMEBUFFER, gl->fbo);
129 |     glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D,
130 |                            gl->fbo_texture, 0);
131 | 
132 |     assert(glGetError() == 0);
133 | }
134 | 
135 | static void matmul(float *r, float *a, float *b)
136 | {
137 |     int i, j, m;
138 |     float sum;
139 | 
140 |     for (i = 0; i < 4; i++) for (j = 0; j < 4; j++) {
141 |         sum = 0.0f;
142 |         for (m = 0; m < 4; m++) {
143 |             sum += b[i * 4 + m] * a[m * 4 + j];
144 |         }
145 |         r[i * 4 + j] = sum;
146 |     }
147 | }
148 | 
149 | void draw(struct egl *egl, struct gl *gl, float *mtx,
150 |     EGLSurface surface, EGLSurface enc_surface)
151 | {
152 |     float matrix[16];
153 | 
154 |     memcpy(matrix, mtx, sizeof(matrix));
155 | 
156 |     int width, height;
157 |     float vert[] = {-1.0f, -1.0f, -1.0f, 1.0f, 1.0f, -1.0f, 1.0f, 1.0f};
158 | 
159 |     if (enc_surface) {
160 |         if (!eglMakeCurrent(egl->display, enc_surface, enc_surface, egl->context))
161 |             assert(0);
162 | 
163 |         eglQuerySurface(egl->display, enc_surface, EGL_WIDTH, &width);
164 |         eglQuerySurface(egl->display, enc_surface, EGL_HEIGHT, &height);
165 |         glViewport(0, 0, width, height);
166 | 
167 |         glBindFramebuffer(GL_FRAMEBUFFER, 0);
168 |         glBindTexture(GL_TEXTURE_EXTERNAL_OES, gl->camera_texture);
169 |         glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
170 |         glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
171 | 
172 |         glUseProgram(gl->program_oes);
173 |         glUniform1i(gl->texture_oes, 0);
174 | 
175 |         glVertexAttribPointer(gl->pos_oes, 2, GL_FLOAT, GL_FALSE, 0, vert);
176 |         glEnableVertexAttribArray(gl->pos_oes);
177 | 
178 |         glUniformMatrix4fv(gl->texture_matrix_oes, 1, 1, matrix);
179 |         glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
180 | 
181 |         eglSwapBuffers(egl->display, enc_surface);
182 |     }
183 | 
184 |     if (surface) {
185 |         if (!eglMakeCurrent(egl->display, surface, surface, egl->context))
186 |             assert(0);
187 | 
188 |         eglQuerySurface(egl->display, surface, EGL_WIDTH, &width);
189 |         eglQuerySurface(egl->display, surface, EGL_HEIGHT, &height);
190 |         glViewport(0, 0, width, height);
191 | 
192 |         glBindFramebuffer(GL_FRAMEBUFFER, 0);
193 |         glBindTexture(GL_TEXTURE_EXTERNAL_OES, gl->camera_texture);
194 |         glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
195 |         glTexParameteri(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
196 | 
197 |         glUseProgram(gl->program_oes);
198 |         glUniform1i(gl->texture_oes, 0);
199 | 
200 |         glVertexAttribPointer(gl->pos_oes, 2, GL_FLOAT, GL_FALSE, 0, vert);
201 |         glEnableVertexAttribArray(gl->pos_oes);
202 | 
203 |         glUniformMatrix4fv(gl->texture_matrix_oes, 1, 1, matrix);
204 |         glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
205 | 
206 | 
207 |         float m = (float) width / (float) height;
208 |         matmul(matrix, mtx, (float[]) {
209 |             (box.x1 - box.x0)/2.0f, 0.0f, 0.0f, 0.0f,
210 |             0.0f, (box.y1 - box.y0)/2.0f, 0.0f, 0.0f,
211 |             0.0f, 0.0f, 1.0f, 0.0f,
212 |             (1.0f + box.x0)/2.0f, (1.0f + box.y0)/2.0f, 0.0f, 1.0f,
213 |         });
214 | 
215 |         glUniformMatrix4fv(gl->texture_matrix_oes, 1, 1, matrix);
216 |         glBindFramebuffer(GL_FRAMEBUFFER, gl->fbo);
217 |         glViewport(0, 0, 227, 227);
218 |         glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
219 | 
220 |         pthread_mutex_lock(&mutex);
221 |         glReadPixels(0, 0, 227, 227, GL_RGBA, GL_UNSIGNED_BYTE, buf);
222 |         pthread_mutex_unlock(&mutex);
223 | 
224 |         glBindFramebuffer(GL_FRAMEBUFFER, 0);
225 | 
226 |         glViewport(0, 0, width, height);
227 | 
228 |         float dx = 227.0f * 2.0f / width;
229 |         float dy = 227.0f * 2.0f / height;
230 |         float vert2[] = {-1.0f, -1.0f,
231 |                         -1.0f, -1.0f + dy,
232 |                         -1.0f + dx, -1.0f,
233 |                         -1.0f + dx, -1.0f + dy};
234 | 
235 |         glUseProgram(gl->program);
236 |         glUniform1i(gl->texture, 0);
237 |         glUniform4f(gl->color, 0.0f, 0.0f, 0.0f, 1.0f);
238 | 
239 |         glBindTexture(GL_TEXTURE_2D, gl->fbo_texture);
240 | 
241 |         glVertexAttribPointer(gl->pos, 2, GL_FLOAT, GL_FALSE, 0, vert2);
242 |         glEnableVertexAttribArray(gl->pos);
243 | 
244 | 
245 |         glUniformMatrix4fv(gl->texture_matrix, 1, 1, (float[]) {
246 |             2.0f / dx, 0.0f, 0.0f, 0.0f,
247 |             0.0f, 2.0f / dy, 0.0f, 0.0f,
248 |             0.0f, 0.0f, 1.0f, 0.0f,
249 |             0.0f, 0.0f, 0.0f, 1.0f,
250 |         });
251 |         glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
252 | 
253 |         float vert3[] = {
254 |             box.x0, box.y0,
255 |             box.x1, box.y0,
256 |             box.x1, box.y1,
257 |             box.x0, box.y1
258 |         };
259 |         glVertexAttribPointer(gl->pos, 2, GL_FLOAT, GL_FALSE, 0, vert3);
260 | 
261 | 
262 |         glUniform4f(gl->color, 1.0f, 0.0f, 0.0f, 0.0f);
263 |         glLineWidth(5.0f);
264 |         glDrawArrays(GL_LINE_LOOP, 0, 4);
265 | 
266 |         eglSwapBuffers(egl->display, surface);
267 |     }
268 | }
269 | 
270 | #include "../util.h"
271 | 
272 | ANativeWindow *window, *enc_window;
273 | EGLSurface surface, enc_surface;
274 | int camera_texture;
275 | struct egl egl;
276 | struct gl gl;
277 | 
278 | int network_id;
279 | 
280 | string weights, weights_bwn;
281 | 
282 | pthread_t thread;
283 | 
284 | #include "../xnornet_bwn.h"
285 | #include "../xnornet.h"
286 | 
287 | #include "../rpi-demo/names2.h"
288 | 
289 | void* work_thread(void *arg)
290 | {
291 |     static float buf_in[227*227*3] __attribute__ ((aligned(16)));
292 |     static uint8_t tmpbuf[xnornet_bwn_tmp_size] __attribute__ ((aligned(16))); // TODO
293 |     float *y;
294 |     int i, j, top[5], id;
295 | 
296 |     while (1) {
297 |         pthread_mutex_lock(&mutex);
298 |         if (paused)
299 |             pthread_cond_wait(&cond, &mutex);
300 | 
301 |         for (i = 0; i < 227; i++) for (j = 0; j < 227; j++) {
302 |             float m[] = {0.01735949, 0.01772787, 0.01774145};
303 |             float b[] = {-2.13645733, -2.04468092, -1.81410977};
304 |             buf_in[i*227*3+(226-j)*3+0] = (float) buf[j*227*4+i*4+0] * m[0] + b[0];
305 |             buf_in[i*227*3+(226-j)*3+1] = (float) buf[j*227*4+i*4+1] * m[1] + b[1];
306 |             buf_in[i*227*3+(226-j)*3+2] = (float) buf[j*227*4+i*4+2] * m[2] + b[2];
307 |         }
308 | 
309 |         id = network_id;
310 | 
311 |         /*{
312 |             float m[] = {0.01735949, 0.01772787, 0.01774145};
313 |             float b[] = {-2.13645733, -2.04468092, -1.81410977};
314 |             string test;
315 |             file_mmap(&test, "/sdcard/image");
316 |             float *buf = test.ptr;
317 |             for (i = 0; i < 227*227*3; i++)
318 |                 buf_in[i] = buf[i] * m[i % 3] + b[i % 3];
319 |         }*/
320 | 
321 |         pthread_mutex_unlock(&mutex);
322 | 
323 |         if (network_id == 0)
324 |             y = xnornet_bwn(buf_in, weights_bwn.ptr, tmpbuf);
325 |         else
326 |             y = xnornet(buf_in, weights.ptr, tmpbuf);
327 | 
328 |         softmax(y, 1000);
329 |         top5(top, y, 1000);
330 | 
331 |         pthread_mutex_lock(&mutex_out);
332 | 
333 |         sprintf(out_string,"Network: %s\n%s:%f\n%s:%f\n%s:%f\n%s:%f\n%s:%f\n",
334 |             (char*[]) {"BWN","XNORNET"}[network_id],
335 |             names[top[0]], y[top[0]],
336 |             names[top[1]], y[top[1]],
337 |             names[top[2]], y[top[2]],
338 |             names[top[3]], y[top[3]],
339 |             names[top[4]], y[top[4]]);
340 | 
341 |         pthread_mutex_unlock(&mutex_out);
342 |     }
343 | }
344 | 
345 | jni(int, init, jobject _surface)
346 | {
347 |     //EGLSurface surface;
348 |     //ANativeWindow *window;
349 |     int ret;
350 | 
351 |     printf("init\n");
352 |     egl_init(&egl);
353 | 
354 |     /*window = ANativeWindow_fromSurface(env, _surface);
355 |     assert(window);
356 |     surface = eglCreateWindowSurface(egl.display, egl.config, window, 0);
357 |     assert(surface); */
358 | 
359 |     enc_window = ANativeWindow_fromSurface(env, _surface);
360 |     assert(enc_window);
361 |     enc_surface = eglCreateWindowSurface(egl.display, egl.config, enc_window, 0);
362 |     assert(enc_surface);
363 | 
364 |     gl_init(&egl, &gl, enc_surface);
365 |     /*eglMakeCurrent(egl.display, 0, 0, 0);
366 | 
367 |     eglDestroySurface(egl.display, surface);
368 |     ANativeWindow_release(window);*/
369 | 
370 |     ret = file_mmap(&weights_bwn, "/sdcard/xnornet_bwn_weights");
371 |     assert(!ret && weights_bwn.size == xnornet_bwn_size);
372 | 
373 |     ret = file_mmap(&weights, "/sdcard/xnornet_weights");
374 |     assert(!ret && weights.size == xnornet_size);
375 | 
376 |     t0 = get_time();
377 | 
378 |     pthread_create(&thread, 0, work_thread, 0);
379 | 
380 |     return gl.camera_texture;
381 | }
382 | 
383 | jni0(int, exit)
384 | {
385 |     return 0;
386 | }
387 | 
388 | jni(void, draw, jfloatArray mtx)
389 | {
390 |     float *m;
391 | 
392 |     m = (*env)->GetFloatArrayElements(env, mtx, 0);
393 | 
394 |     //assert(enc_window);
395 |     draw(&egl, &gl, m, surface, 0);
396 |     //eglMakeCurrent(egl.display, 0, 0, 0);
397 | 
398 |     (*env)->ReleaseFloatArrayElements(env, mtx, m, 0);
399 | }
400 | 
401 | jni(void, encode, jfloatArray mtx)
402 | {
403 |     float *m;
404 | 
405 |     m = (*env)->GetFloatArrayElements(env, mtx, 0);
406 | 
407 |     draw(&egl, &gl, m, 0, enc_surface);
408 | 
409 |     (*env)->ReleaseFloatArrayElements(env, mtx, m, 0);
410 | }
411 | 
412 | jni0(jstring, getoverlay)
413 | {
414 |     jstring out;
415 |     pthread_mutex_lock(&mutex_out);
416 |     out = (*env)->NewStringUTF(env, out_string);
417 |     pthread_mutex_unlock(&mutex_out);
418 |     return out;
419 | }
420 | 
421 | jni(void, setcodecsurface, jobject _surface)
422 | {
423 |     if (enc_window) {
424 |         eglMakeCurrent(egl.display, 0, 0, 0);
425 |         eglDestroySurface(egl.display, enc_surface);
426 |         ANativeWindow_release(enc_window);
427 |     }
428 | 
429 |     enc_window = ANativeWindow_fromSurface(env, _surface);
430 |     assert(enc_window);
431 |     enc_surface = eglCreateWindowSurface(egl.display, egl.config, enc_window, 0);
432 |     assert(enc_surface);
433 | 
434 |     eglMakeCurrent(egl.display, enc_surface, enc_surface, egl.context);
435 | }
436 | 
437 | void set_paused(int pause)
438 | {
439 |     pthread_mutex_lock(&mutex);
440 |     paused = pause;
441 |     if (!pause)
442 |         pthread_cond_broadcast(&cond);
443 |     pthread_mutex_unlock(&mutex);
444 | }
445 | 
446 | jni(void, created, jobject _surface)
447 | {
448 |     set_paused(0);
449 | 
450 |     window = ANativeWindow_fromSurface(env, _surface);
451 |     assert(window);
452 | 
453 |     printf("surface %p->%p\n", _surface, window);
454 | 
455 |     surface = eglCreateWindowSurface(egl.display, egl.config, window, 0);
456 |     assert(surface);
457 | }
458 | 
459 | float scalex, scaley;
460 | 
461 | jni(void, changed, jobject surface, int format, int width, int height)
462 | {
463 |     scalex = 2.0f / (float) width;
464 |     scaley = 2.0f / (float) height;
465 | 
466 |     float dy = 1.0f - (float) width / (float) height;
467 |     box.x0 = -1.0f;
468 |     box.x1 =  1.0f;
469 |     box.y0 = -1.0f;
470 |     box.y1 =  1.0f;
471 | 
472 |     box.y0 += dy;
473 |     box.y1 -= dy;
474 | }
475 | 
476 | jni(void, destroyed, jobject null)
477 | {
478 |     set_paused(1);
479 | 
480 |     eglDestroySurface(egl.display, surface);
481 |     ANativeWindow_release(window);
482 | 
483 |     window = 0;
484 |     surface = 0;
485 | }
486 | 
487 | jni(void, setnetwork, jint id)
488 | {
489 |     pthread_mutex_lock(&mutex);
490 |     network_id = id;
491 |     pthread_mutex_unlock(&mutex);
492 | }
493 | 
494 | /* keep a lists of frames which start with keyframes to create clips
495 |    these can then be muxed to create a video file
496 |    also keep codec configuration data to use as starting information
497 |         *uses the last recieved configuration so may be wrong in some cases
498 | */
499 | 
500 | #include <media/NdkMediaMuxer.h>
501 | #define FLAG_KEY_FRAME 1
502 | #define FLAG_CODEC_CONFIG 2
503 | 
504 | struct frames {
505 |     size_t num_frame, data_size;
506 |     void *data;
507 |     struct AMediaCodecBufferInfo *info;
508 | };
509 | 
510 | uint8_t codec_config[256];
511 | int codec_config_size;
512 | int64_t last_timestamp;
513 | 
514 | #define RING_SIZE 8 // currently used as lazy way to determine clip length
515 | 
516 | struct {
517 |     int first, last;
518 |     struct frames data[RING_SIZE];
519 | } frames;
520 | 
521 | jni(void, addbuffer, jobject buf, int offset, int size, int64_t timestamp, int flags)
522 | {
523 |     struct frames *f;
524 |     void *ptr;
525 |     int i;
526 | 
527 |     last_timestamp = timestamp;
528 | 
529 |     ptr = (*env)->GetDirectBufferAddress(env, buf);
530 |     assert(ptr);
531 | 
532 |     assert(flags == FLAG_KEY_FRAME || flags == FLAG_CODEC_CONFIG || !flags);
533 | 
534 |     if (flags == FLAG_CODEC_CONFIG) {
535 |         assert(size < 256);
536 |         memcpy(codec_config, ptr + offset, size);
537 |         codec_config_size = size;
538 |         // clear everything
539 |         for (i = frames.first;; ) {
540 |             f = &frames.data[i];
541 |             free(f->data);
542 |             free(f->info);
543 |             f->data = 0;
544 |             f->info = 0;
545 |             f->num_frame = 0;
546 |             f->data_size = 0;
547 |             if (i == frames.last)
548 |                 break;
549 |             i = (i + 1) % RING_SIZE;
550 |         }
551 |         frames.first = frames.last;
552 |         return;
553 |     }
554 | 
555 |     if (flags == FLAG_KEY_FRAME) {
556 |         frames.last = (frames.last + 1) % RING_SIZE;
557 |         if (frames.last == frames.first) {
558 |             f = &frames.data[frames.first];
559 |             free(f->data);
560 |             free(f->info);
561 |             f->data = 0;
562 |             f->info = 0;
563 |             f->num_frame = 0;
564 |             f->data_size = 0;
565 |             frames.first = (frames.first + 1) % RING_SIZE;
566 |         }
567 |     }
568 | 
569 |     f = &frames.data[frames.last];
570 | 
571 |     f->data = realloc(f->data, f->data_size + size);
572 |     assert(f->data);
573 |     f->info = realloc(f->info, (f->num_frame+1) * sizeof(*f->info));
574 |     assert(f->info);
575 | 
576 |     f->info[f->num_frame] = (struct AMediaCodecBufferInfo) {f->data_size, size, timestamp, flags};
577 |     memcpy(f->data + f->data_size, ptr + offset, size);
578 | 
579 |     f->num_frame += 1;
580 |     f->data_size += size;
581 | }
582 | 
583 | jni(void, writemux, jstring path)
584 | {
585 |     AMediaMuxer *mux;
586 |     ssize_t track;
587 |     struct frames *f;
588 |     int i, j, fd;
589 | 
590 |     const char *path_c = (*env)->GetStringUTFChars(env, path, 0);
591 |     assert(path_c);
592 | 
593 |     fd = open(path_c, O_WRONLY | O_CREAT, 0666);
594 |     assert(fd >= 0);
595 | 
596 |     (*env)->ReleaseStringUTFChars(env, path, path_c);
597 | 
598 |     mux = AMediaMuxer_new(fd, AMEDIAMUXER_OUTPUT_FORMAT_MPEG_4);
599 |     assert(mux);
600 | 
601 |     {
602 |         AMediaFormat *fmt;
603 | 
604 |         fmt = AMediaFormat_new();
605 | 
606 |         AMediaFormat_setString(fmt, AMEDIAFORMAT_KEY_MIME, "video/avc");
607 | 
608 |         //
609 |         AMediaFormat_setInt32(fmt, AMEDIAFORMAT_KEY_WIDTH, 640);
610 |         AMediaFormat_setInt32(fmt, AMEDIAFORMAT_KEY_HEIGHT, 480);
611 | 
612 |         // native media muxer doesnt support codec config sample data..
613 |         // hack around it
614 | 
615 |         int split = -1;
616 |         for (i = 4; i < codec_config_size - 3; i++) {
617 |             if (!memcmp(codec_config + i, (char[]) {0, 0, 0, 1}, 4)) {
618 |                 split = i;
619 |                 break;
620 |             }
621 |         }
622 |         assert(split >= 0);
623 | 
624 |         AMediaFormat_setBuffer(fmt, "csd-0", codec_config, split);
625 |         AMediaFormat_setBuffer(fmt, "csd-1", codec_config + split, codec_config_size - split);
626 | 
627 |         track = AMediaMuxer_addTrack(mux, fmt);
628 |         assert(track >= 0);
629 | 
630 |         AMediaFormat_delete(fmt);
631 |     }
632 | 
633 |     AMediaMuxer_start(mux);
634 | 
635 |     //AMediaMuxer_writeSampleData(mux, track, codec_config,
636 |     //    &(struct AMediaCodecBufferInfo) {0, codec_config_size, 0, FLAG_CODEC_CONFIG});
637 | 
638 |     for (i = frames.first;; ) {
639 |         f = &frames.data[i];
640 |         printf("id %i %i\n", i, f->num_frame);
641 |         for (j = 0; j < f->num_frame; j++)
642 |             AMediaMuxer_writeSampleData(mux, track, f->data, &f->info[j]);
643 |         if (i == frames.last)
644 |             break;
645 |         i = (i + 1) % RING_SIZE;
646 |     }
647 | 
648 |     AMediaMuxer_stop(mux);
649 |     AMediaMuxer_delete(mux);
650 |     close(fd);
651 | }
652 | 
653 | jni(void, drag, float x0, float y0, float x1, float y1)
654 | {
655 |     float dx = (x1 - x0) * scalex, dy = (y1 - y0) * scaley;
656 |     float x = x0 * scalex - 1.0f, y = -y0 * scaley + 1.0f;
657 | 
658 |     if (!(x >= box.x0 && x <= box.x1 && y >= box.y0 && y <= box.y1))
659 |         return;
660 | 
661 |     float mx = (box.x0 + box.x1) * 0.5f;
662 |     float my = (box.y0 + box.y1) * 0.5f;
663 | 
664 |     if (x < mx)
665 |         box.x0 += dx;
666 |     else
667 |         box.x1 += dx;
668 | 
669 |     if (y < my)
670 |         box.y0 -= dy;
671 |     else
672 |         box.y1 -= dy;
673 | 
674 | }
675 | 
676 | 


--------------------------------------------------------------------------------
/android-demo/make:
--------------------------------------------------------------------------------
 1 | set -e
 2 | 
 3 | TOOLS="/usr/lib/android-sdk/build-tools/debian/"
 4 | PLATFORM_JAR="/usr/lib/android-sdk/platforms/android-23/android.jar"
 5 | NDK="$NDK/platforms/android-23/arch-arm/usr"
 6 | 
 7 | rm -r build || true
 8 | mkdir build build/test build/lib build/lib/$ARCH
 9 | 
10 | clang-5.0 -Wall -nostdlib -shared -Ofast -flto -fuse-ld=lld-5.0 -I$NDK/include -L$NDK/lib $CFLAGS -DNUM_THREAD=1 \
11 |     -lc -llog -lGLESv2 -landroid -lmediandk \
12 |      *.c ../xnornet_bwn.c ../xnornet.c ../util.c -I../ \
13 |     -o build/lib/$ARCH/libhello-jni.so
14 | 
15 | 
16 | $TOOLS/aapt package -M AndroidManifest.xml -I $PLATFORM_JAR -S res -F build/app.res.apk -J build/test -f
17 | 
18 | javac -d build -source 7 -target 7 -classpath $PLATFORM_JAR build/test/R.java *.java
19 | 
20 | $TOOLS/dx --dex --output=build/classes.dex build
21 | 
22 | java -classpath /usr/share/java/com.android.tools.sdklib.jar \
23 |     com.android.sdklib.build.ApkBuilderMain \
24 |     build/app.tmp.apk -u -z build/app.res.apk -f build/classes.dex -nf build/lib
25 | 
26 | keytool -genkeypair -v -dname "cn=none, ou=none, o=none, c=CA" -keystore build/keystore \
27 |     -keyalg RSA -keysize 2048 -validity 36500 -alias "alias" -keypass pass12 -storepass pass12
28 | 
29 | jarsigner -sigalg SHA1withRSA -digestalg SHA1 -keystore build/keystore build/app.tmp.apk \
30 |     -keypass pass12 -storepass pass12 alias
31 | zipalign -f 4 build/app.tmp.apk build/app.apk
32 | 


--------------------------------------------------------------------------------
/android-demo/res/drawable/icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmarek1/binarynet-tensorflow/b67ec553cec7843bce44b188f3ac843d10aba570/android-demo/res/drawable/icon.png


--------------------------------------------------------------------------------
/android-demo/shaders.c:
--------------------------------------------------------------------------------
 1 | #include <GLES2/gl2.h>
 2 | #include <assert.h>
 3 | 
 4 | char *vertex_shader =
 5 |     "attribute vec2 pos;\n"
 6 |     "uniform mat4 texmat;\n"
 7 |     "varying vec2 coord;\n"
 8 |     "void main() {\n"
 9 |     "  coord = (vec4((pos + vec2(1.0, 1.0)) * 0.5, 0.0, 1.0) * texmat).xy;\n"
10 |     "  gl_Position = vec4(pos, 0.0, 1.0);\n"
11 |     "}\n";
12 | 
13 | char *fragment_shader =
14 |     "precision mediump float;\n"
15 |     "varying vec2 coord;\n"
16 |     "uniform sampler2D tex;\n"
17 |     "uniform vec4 color;\n"
18 |     "void main() {\n"
19 |     "  gl_FragColor = vec4(texture2D(tex, coord).rgb * color.a + color.rgb, 1.0);\n"
20 |     "}\n";
21 | 
22 | char *fragment_shader_oes =
23 |     "#extension GL_OES_EGL_image_external : require\n"
24 |     "precision mediump float;\n"
25 |     "varying vec2 coord;\n"
26 |     "uniform samplerExternalOES tex;\n"
27 |     "void main() {\n"
28 |     "  gl_FragColor = vec4(texture2D(tex, coord).rgb, 1.0);\n"
29 |     "}\n";
30 | 
31 | /*char* ver =
32 |     "attribute vec2 vPosition;\n"
33 |     "varying vec2 coord;\n"
34 |     "void main() {\n"
35 |     "  coord = (vPosition * vec2(1.0, 1.0) + vec2(1.0, 1.0)) * 0.5;\n"
36 |     "  gl_Position = vec4((vPosition + vec2(1.0, 1.0)) * vec2(227.0 / 1920.0, 227.0 / 1080.0) - vec2(1.0, 1.0), 0.0, 1.0);\n"
37 |     "}\n"; */
38 | 
39 | 
40 | static GLuint compile_shader(GLenum type, const char* source)
41 | {
42 |     GLuint shader;
43 |     GLint status;
44 | 
45 |     shader = glCreateShader(type);
46 |     if (!shader)
47 |         return 0;
48 | 
49 |     glShaderSource(shader, 1, &source, 0);
50 |     glCompileShader(shader);
51 |     glGetShaderiv(shader, GL_COMPILE_STATUS, &status);
52 |     if (!status) {
53 |         glDeleteShader(shader);
54 |         assert(0);
55 |     }
56 | 
57 |     return shader;
58 | }
59 | 
60 | GLuint compile_program(char *vertex_source, char *fragment_source)
61 | {
62 |     GLuint vertex_shader, fragment_shader, program;
63 |     GLint status;
64 | 
65 |     vertex_shader = compile_shader(GL_VERTEX_SHADER, vertex_source);
66 |     fragment_shader = compile_shader(GL_FRAGMENT_SHADER, fragment_source);
67 |     assert(vertex_shader && fragment_shader);
68 | 
69 |     program = glCreateProgram();
70 |     assert(program);
71 | 
72 |     glAttachShader(program, vertex_shader);
73 |     glAttachShader(program, fragment_shader);
74 |     glLinkProgram(program);
75 |     glGetProgramiv(program, GL_LINK_STATUS, &status);
76 |     if (!status)
77 |         assert(0);
78 | 
79 |     glDeleteShader(vertex_shader);
80 |     glDeleteShader(fragment_shader);
81 |     return program;
82 | }
83 | 


--------------------------------------------------------------------------------
/android-demo/shaders.h:
--------------------------------------------------------------------------------
1 | #include <GLES2/gl2.h>
2 | 
3 | extern char *vertex_shader, *fragment_shader, *fragment_shader_oes;
4 | 
5 | GLuint compile_program(char *vertex_source, char *fragment_source);
6 | 


--------------------------------------------------------------------------------
/benchmark.c:
--------------------------------------------------------------------------------
 1 | /*
 2 | clang-5.0 -target aarch64-linux-gnu -I /usr/aarch64-linux-gnu/include -Wno-builtin-requires-header -mcpu=cortex-a53 benchmark.c benchmark/*.c -I. -flto -Ofast -pthread -DPRINT_TIME
 3 | */
 4 | #include <stdint.h>
 5 | #include <stdio.h>
 6 | #include "benchmark/benchmark_float.h"
 7 | #include "benchmark/benchmark_int8.h"
 8 | #include "benchmark/benchmark_float_bin.h"
 9 | #include "benchmark/benchmark_int8_bin.h"
10 | #include "benchmark/benchmark_bin.h"
11 | 
12 | static uint8_t data[benchmark_float_size] __attribute__((aligned(16)));
13 | static uint8_t x[1024*1024] __attribute__((aligned(16)));
14 | static uint8_t tmp[1024*1024] __attribute__((aligned(16)));
15 | 
16 | #include <sched.h>
17 | 
18 | int main(void)
19 | {
20 |     int r = sched_setscheduler(getpid(), SCHED_FIFO, &(struct sched_param) {.sched_priority = 1});
21 |     printf("sched_setscheduler %i\n", r);
22 | 
23 |     printf("float:\n");
24 |     benchmark_float(x, data, tmp);
25 |     printf("int8:\n");
26 |     benchmark_int8(x, data, tmp);
27 |     printf("float_bin:\n");
28 |     benchmark_float_bin(x, data, tmp);
29 |     printf("int8_bin:\n");
30 |     benchmark_int8_bin(x, data, tmp);
31 |     printf("bin:\n");
32 |     benchmark_bin(x, data, tmp);
33 |     return 0;
34 | }
35 | 


--------------------------------------------------------------------------------
/benchmark.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import bnn
 3 | import tf_export
 4 | 
 5 | x0 = tf.placeholder(tf.float32, [None, 1, 1, 4096])
 6 | x1 = bnn.activation(x0)
 7 | 
 8 | y0 = bnn.layer(x0, 4096, activate='none', norm=False, binary=False)
 9 | y1 = bnn.layer(x0, 4096, activate='none', norm=False)
10 | y2 = bnn.layer(x1, 4096, activate='none', norm=False)
11 | 
12 | with tf.Session() as sess:
13 |     sess.run(tf.global_variables_initializer())
14 | 
15 |     tf_export.export(y0, x0, 'benchmark_float', False)
16 |     tf_export.export(y0, x0, 'benchmark_int8', True)
17 |     tf_export.export(y1, x0, 'benchmark_float_bin', False)
18 |     tf_export.export(y1, x0, 'benchmark_int8_bin', True)
19 |     tf_export.export(y2, x0, 'benchmark_bin', False)
20 | 


--------------------------------------------------------------------------------
/bnn.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import numpy as np
 3 | 
 4 | # code
 5 | 
 6 | # binary activation function
 7 | def activation(x):
 8 |     x = tf.clip_by_value(x, -1.0, 1.0)
 9 |     return x + tf.stop_gradient(tf.sign(x) - x)
10 | 
11 | # create weight + bias variables with update op as in BinaryNet
12 | def weight_bias(shape, binary=True):
13 |     print(shape)
14 |     init = tf.random_uniform(shape, -1.0, 1.0)
15 |     x = tf.Variable(init)
16 | 
17 |     if binary:
18 |         y = tf.Variable(init)
19 | 
20 |         coeff = np.float32(1./np.sqrt(1.5/ (np.prod(shape[:-2]) * (shape[-2] + shape[-1]))))
21 |         print(coeff)
22 | 
23 |         tmp = y + coeff * (x - y)
24 |         tmp = tf.clip_by_value(tmp, -1.0, 1.0)
25 |         tmp = tf.group(x.assign(tmp), y.assign(tmp))
26 |         tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, tmp)
27 | 
28 |         x = tf.clip_by_value(x, -1.0, 1.0)
29 |         xbin = tf.sign(x) * tf.reduce_mean(tf.abs(x), axis=[0, 1, 2])
30 |         x = x + tf.stop_gradient(xbin - x)
31 | 
32 |     return x, tf.Variable(tf.constant(0.1, shape=[shape[-1]]))
33 | 
34 | def batch_norm(x, epsilon, decay=0.9):
35 |     train = tf.get_default_graph().get_tensor_by_name('is_training:0')
36 |     return tf.contrib.layers.batch_norm(x, decay=decay, center=True, scale=True,
37 |         epsilon=epsilon, updates_collections=None, is_training=train, trainable=True,
38 |         fused=True)
39 | 
40 | # a layer in BinaryNet
41 | def layer(x, num_output, filter_size=[1, 1], stride=[1, 1], pool=None, activate='bin',
42 |           binary=True, norm=True, epsilon=0.0001, padding='SAME'):
43 |     shape = filter_size + [x.shape[-1].value, num_output]
44 | 
45 |     W, b = weight_bias(shape, binary)
46 | 
47 |     x = tf.nn.conv2d(x, W, strides=[1, *stride, 1], padding=padding) + b
48 | 
49 |     if activate == 'bin':
50 |         if pool is not None:
51 |             x = tf.nn.max_pool(x, ksize=[1, *pool[0], 1], strides=[1, *pool[-1], 1], padding='VALID')
52 | 
53 |         if norm:
54 |             x = batch_norm(x, epsilon)
55 |     else:
56 |         if norm:
57 |             x = batch_norm(x, epsilon)
58 | 
59 |         if pool is not None:
60 |             x = tf.nn.max_pool(x, ksize=[1, *pool[0], 1], strides=[1, *pool[-1], 1], padding='VALID')
61 | 
62 |     if activate == 'bin':
63 |         return activation(x)
64 |     elif activate == 'relu':
65 |         return tf.nn.relu(x)
66 | 
67 |     assert(activate == 'none')
68 |     return x
69 | 


--------------------------------------------------------------------------------
/c_ops.h:
--------------------------------------------------------------------------------
  1 | #include <stddef.h>
  2 | #include <stdint.h>
  3 | #include <stdbool.h>
  4 | 
  5 | /*
  6 | implementation of the layers in C
  7 | definitely needs some cleaning up
  8 | */
  9 | 
 10 | typedef unsigned uint;
 11 | #define sizearray(x) (sizeof(x)/sizeof(*x))
 12 | 
 13 | typedef struct {
 14 |     uint dim, shape[4];
 15 |     void *data;
 16 |     void *storage[2];
 17 |     uint type;
 18 | } tensor;
 19 | 
 20 | //static assert after optimizations
 21 | #define _str(x) #x
 22 | #define str(x) _str(x)
 23 | #define _assert(x) if (!(x)) \
 24 |     __asm__("assert fail (expression not constant or false), line=" str(__LINE__) " file="__FILE__);
 25 | 
 26 | #define ptr(t, type, x, y...) ({ \
 27 |     type _type; \
 28 |     __auto_type _t = t; \
 29 |     typeof(x) _x[] = {x, y}; \
 30 |     _assert(sizearray(_x) == _t.dim); \
 31 |     int offset = 0; \
 32 |     for (int i = 0; i < sizearray(_x); i++) \
 33 |         offset = offset * _t.shape[i] + _x[i]; \
 34 |     (typeof(_type)*) (_t.data + _Generic(_type, \
 35 |         bool: offset / 8, \
 36 |         uint8_t: offset, \
 37 |         uint16_t: offset * 2, \
 38 |         float : offset * 4)); \
 39 | })
 40 | 
 41 | #define output(t, x, y...) ({ \
 42 |     __auto_type _t = t; \
 43 |     typeof(x) _x[] = {x, y}; \
 44 |     tensor out; \
 45 |     out.dim = sizearray(_x); \
 46 |     for (int i = 0; i < sizearray(_x); i++) \
 47 |         out.shape[i] = _x[i]; \
 48 |     out.storage[0] = _t.storage[0]; \
 49 |     out.storage[1] = _t.storage[1]; \
 50 |     out.data = _t.data == _t.storage[0] ? _t.storage[1] : _t.storage[0]; \
 51 |     out.type = _t.type; \
 52 |     out; \
 53 | })
 54 | enum {
 55 |     FLOAT,
 56 |     INT8,
 57 |     INT16,
 58 |     BINARY,
 59 | };
 60 | 
 61 | enum {
 62 |     ACTIVE_NONE,
 63 |     ACTIVE_BIN,
 64 |     ACTIVE_RELU,
 65 | };
 66 | #include <arm_neon.h>
 67 | 
 68 | #include "c_ops_neon.h"
 69 | //#include "cortexa53.h"
 70 | 
 71 | __attribute__ ((always_inline))
 72 | static void binarize_float(uint32_t *output, float32x4_t *buf, uint size)
 73 | {
 74 |     uint32x4_t *buf_u32 = (void*) buf;
 75 |     uint8x16_t *buf_u8 = (void*) buf;
 76 |     uint k;
 77 | 
 78 |     _assert(size % 8 == 0);
 79 | 
 80 |     //note: for some reason clang stores min as a list of 24 pointers on the stack instead of a single pointer
 81 | 
 82 |     //printf("%f %f %f %f\n", buf[0][0], min[0][0], beta[0][0], sum);
 83 |     //printf("%f\n", buf[0][0] + min[0][0] * sum);
 84 | 
 85 |     for (k = 0; k < size; k++)
 86 |         buf_u32[k] = vcltq_f32(buf[k], vdupq_n_f32(0.0f));
 87 |         //buf_u32[k] = vcltq_f32(buf[k] + min[k] * vdupq_n_f32(sum), beta[k]);
 88 | 
 89 |     for (k = 0; k < size / 4; k++) {
 90 |         buf_u8[k] = vcombine_u8(
 91 |         vmovn_u16(vcombine_u16(vmovn_u32(buf_u32[k*4+0]), vmovn_u32(buf_u32[k*4+1]))),
 92 |         vmovn_u16(vcombine_u16(vmovn_u32(buf_u32[k*4+2]), vmovn_u32(buf_u32[k*4+3]))));
 93 |     }
 94 | 
 95 |     binarize(output, buf_u8, size / 4);
 96 | }
 97 | 
 98 | __attribute__ ((always_inline))
 99 | static void binarize_u16(uint8x16_t *output, uint16x8_t *buf, uint16x8_t *beta, uint size)
100 | {
101 |     uint8x16_t buf_u8[size / 2];
102 |     uint k;
103 | 
104 |     _assert(size % 4 == 0);
105 | 
106 |     for (k = 0; k < size; k++)
107 |         buf[k] = vcltq_u16(beta[k], buf[k]);
108 | 
109 |     for (k = 0; k < size / 2; k++)
110 |         buf_u8[k] = vcombine_u8(vmovn_u16(buf[k*2]), vmovn_u16(buf[k*2+1]));
111 | 
112 |     binarize((void*) output, buf_u8, size / 2);
113 | }
114 | 
115 | //#include <android/log.h>
116 | //#define printf(args...) __android_log_print(ANDROID_LOG_ERROR, "test_app", args)
117 | 
118 | __attribute__ ((always_inline))
119 | static tensor conv2d(tensor input, tensor filter, tensor out_b, uint sx, uint sy, uint px, uint py, uint activation, float *qp, int *sync)
120 | {
121 |     _assert(input.dim == 3 && filter.dim == 4);
122 | 
123 |     bool accum16 = (input.type == BINARY && filter.type == BINARY);
124 | #define f(x) register uint8x16_t v##x;
125 |     for_each_reg(f)
126 | #undef f
127 | #ifdef __aarch64__
128 | #define v_ptr (uint8x16_t[]) {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23}
129 |     int split_size = accum16 ? 128 : 96;
130 |     int split_num = (filter.shape[3] - 1) / split_size + 1;
131 |     int num_reg_base = accum16 ? 16 : 24;
132 | #else
133 |     int split_size = accum16 ? 64 : 32;
134 |     int split_num = (filter.shape[3] - 1) / split_size + 1;
135 |     int num_reg_base = accum16 ? 8 : 8;
136 | #define v_ptr (uint8x16_t[]) {v0, v1, v2, v3, v4, v5, v6, v7}
137 | #endif
138 |     int num_reg;
139 |     int num_reg_last = (filter.shape[3] / (accum16 ? 8 : 4) - 1) % num_reg_base + 1;
140 | 
141 |     _assert(filter.shape[3] % (accum16 ? 8 : 4) == 0);
142 | 
143 |     int i, j, k, x, y, z, u, v, out_w, out_h, xl, yl;
144 |     int32_t sum;
145 | 
146 |     out_w = (input.shape[0] + px * 2 - filter.shape[0]) / sx + 1;
147 |     out_h = (input.shape[1] + py * 2 - filter.shape[1]) / sy + 1;
148 | 
149 |     _assert((input.shape[0] + px * 2 - filter.shape[0]) % sx == 0);
150 |     _assert((input.shape[1] + py * 2 - filter.shape[1]) % sy == 0);
151 | 
152 |     tensor output = output(input, out_w, out_h, filter.shape[3]);
153 | 
154 |     int kk;
155 |     do {
156 |         kk = __sync_fetch_and_add(sync, 1);
157 |         if (kk >= out_w * out_h * split_num)
158 |             break;
159 | 
160 |         k = kk % split_num; kk /= split_num;
161 |         j = kk % out_h; kk /= out_h;
162 |         i = kk;
163 | 
164 |         xl = i * sx - px;
165 |         yl = j * sy - py;
166 |         num_reg = (k + 1 == split_num) ? num_reg_last : num_reg_base;
167 | 
168 | #define f(x) if (x < num_reg) v##x = (uint8x16_t) {};
169 |         for_each_reg(f)
170 | #undef f
171 | 
172 |         sum = 0;
173 | 
174 |         for (u = 0; u < filter.shape[0]; u++) for (v = 0; v < filter.shape[1]; v++) {
175 |             x = xl + u;
176 |             y = yl + v;
177 |             if (x < 0 || x >= input.shape[0] || y < 0 || y >= input.shape[1]) {
178 |                if (input.type == BINARY && filter.type == BINARY) {
179 | #define f(x) if (x < num_reg) v##x = \
180 |     vreinterpretq_u8_u16(vreinterpretq_u16_u8(v##x) + vdupq_n_u16(filter.shape[2] / 2));
181 |                 for_each_reg(f)
182 | #undef f
183 |                 }
184 |                 continue;
185 |             }
186 | 
187 |             if (input.type == FLOAT && filter.type == FLOAT) {
188 |                 float_kernel(ptr(input, float, x, y, 0),
189 |                     //ptr(filter, float, u, v, 0, 0) + k * split_size,
190 |                     ptr(filter, float, u, v, 0, 0) + k * filter.shape[2] * split_size,
191 |                              filter.shape[2]);
192 |             } else if (input.type == FLOAT && filter.type == BINARY) {
193 |                 float_bin_kernel(ptr(input, float, x, y, 0),
194 |                 ptr(filter, bool, u, v, 0, 0) + k * split_size * filter.shape[2] / 8,
195 |                              filter.shape[2]);
196 |             } else if (input.type == INT8 && filter.type == BINARY) {
197 |                 _assert(num_reg % 2 == 0);
198 | 
199 |                 for (z = 0; z < filter.shape[2]; z++)
200 |                     sum += *ptr(input, uint8_t, x, y, z);
201 |                 int8_bin_kernel(ptr(input, uint8_t, x, y, 0),
202 |                 ptr(filter, bool, u, v, 0, 0) + k * split_size * filter.shape[2] / 8,
203 |                              filter.shape[2]);
204 |             } else if (input.type == INT8 && filter.type == INT8) {
205 |                 //sum += 128 * filter.shape[2];
206 |                 for (z = 0; z < filter.shape[2]; z++)
207 |                     sum += (int8_t) *ptr(input, uint8_t, x, y, z);
208 | 
209 |                 int8_kernel(ptr(input, uint8_t, x, y, 0),
210 |                 ptr(filter, uint8_t, u, v, 0, 0) + k * split_size * filter.shape[2],
211 |                              filter.shape[2]);
212 | 
213 |             } else if (input.type == BINARY && filter.type == BINARY) {
214 |                 bin_kernel(ptr(input, bool, x, y, 0),
215 |                 ptr(filter, bool, u, v, 0, 0) + k * split_size * filter.shape[2] / 8,
216 |                              filter.shape[2]);
217 |             } else {
218 |                 _assert(0);
219 |             }
220 |         }
221 | 
222 |         void *out;
223 |         void *b, *m, *c, *d;
224 | 
225 |         if (input.type == FLOAT || input.type == INT8) { //float accum
226 |             out = ptr(output, float, i, j, k * split_size);
227 |             m = ptr(out_b, float, 0, k * split_size);
228 |             b = ptr(out_b, float, 1, k * split_size);
229 |             c = ptr(out_b, float, 2, k * split_size);
230 |             d = ptr(out_b, float, 3, k * split_size);
231 | 
232 |             float sum2;
233 |             if (filter.type == INT8 && input.type == INT8)
234 |                 sum2 = (float) sum + (filter.shape[0]*filter.shape[1]*filter.shape[2]) * qp[1];
235 | 
236 | #define f(x) if (x < num_reg) ({ \
237 |     float32x4_t tmp; \
238 |     tmp = vreinterpretq_f32_u8(v##x); \
239 |     if (input.type == INT8) { \
240 |         int32x4_t i = vreinterpretq_s32_u8(v##x); \
241 |         if (filter.type == BINARY) \
242 |             i = -i * vdupq_n_s32(2) + vdupq_n_s32(sum); \
243 |         tmp = vcvtq_f32_s32(i); \
244 |     } \
245 |     if (filter.type == FLOAT) \
246 |         tmp = tmp + ((float32x4_t*)m)[x]; \
247 |     else if (filter.type == INT8 && input.type == INT8) \
248 |         tmp = vmulq_n_f32(tmp + ((float32x4_t*)c)[x] * vdupq_n_f32(sum2) + vdupq_n_f32(qp[1]) * ((float32x4_t*)d)[x], qp[0]) * ((float32x4_t*)b)[x] + ((float32x4_t*)m)[x]; \
249 |     else if (input.type == FLOAT) \
250 |         tmp = tmp * ((float32x4_t*)m)[x] + ((float32x4_t*)b)[x];\
251 |     else \
252 |         tmp = tmp * vmulq_n_f32(((float32x4_t*)m)[x], qp[0]) + ((float32x4_t*)b)[x]; \
253 |     if (activation == ACTIVE_RELU) \
254 |         tmp = vmaxq_f32(tmp, vdupq_n_f32(0.0f)); \
255 |     if (activation == ACTIVE_BIN) \
256 |         v##x = tmp; \
257 |     else \
258 |         ((float32x4_t*)out)[x] = tmp; \
259 | });
260 |         for_each_reg(f)
261 | #undef f
262 |             if (activation == ACTIVE_BIN)
263 |                 binarize_float((void*) ptr(output, bool, i, j, k * split_size), (void*) v_ptr, num_reg);
264 |         } else if (input.type == BINARY) {
265 |             _assert(filter.type == BINARY);;
266 | 
267 |             if (activation == ACTIVE_BIN) {
268 |                 out = ptr(output, bool, i, j, k * split_size);
269 |                 b = ptr(out_b, uint16_t, 0, k * split_size);
270 |                 binarize_u16(out, (void*) v_ptr, b, num_reg);
271 |             } else {
272 |                 out = ptr(output, float, i, j, k * split_size);
273 |                 m = ptr(out_b, float, 0, k * split_size);
274 |                 b = ptr(out_b, float, 1, k * split_size);
275 | #if 1
276 |                 int z;
277 |                 for (z = 0; z < num_reg * 8; z++) {
278 |                     ((float*) out)[z] = (float) (int) (filter.shape[0]*filter.shape[1]*filter.shape[2] - 2 * vreinterpretq_u16_u8(v_ptr[z / 8])[z % 8])
279 |                         * ((float*) m)[z] + ((float*) b)[z];
280 | 
281 |                     if (activation == ACTIVE_RELU)
282 |                         ((float*) out)[z] = __builtin_fmaxf(((float*) out)[z], 0.0f);
283 |                 }
284 | 
285 | #else
286 | #define f(x) if (x < num_reg) ({ \
287 |     float32x4_t tmp0, tmp1, in_size; \
288 |     in_size = vdupq_n_f32(filter.shape[0]*filter.shape[1]*filter.shape[2]); \
289 |     tmp0 = vcvtq_f32_u32(vmovl_u16(vget_low_u16(v##x))); \
290 |     tmp1 = vcvtq_f32_u32(vmovl_u16(vget_high_u16(v##x))); \
291 |     tmp0 = in_size - tmp0 * vdupq_n_f32(2.0f); \
292 |     tmp1 = in_size - tmp1 * vdupq_n_f32(2.0f); \
293 |     tmp0 = tmp0 * ((float32x4_t*)m)[x*2+0] + ((float32x4_t*)b)[x*2+0]; \
294 |     tmp1 = tmp1 * ((float32x4_t*)m)[x*2+1] + ((float32x4_t*)b)[x*2+1]; \
295 |     ((float32x4_t*)out)[x*2+0] = tmp0; \
296 |     ((float32x4_t*)out)[x*2+1] = tmp1; \
297 | });
298 |         for_each_reg(f)
299 | #undef f
300 | #endif
301 |             }
302 |         } else {
303 |             _assert(0);
304 |         }
305 |     } while (1);
306 | 
307 |     if (activation == ACTIVE_BIN) {
308 |         output.type = BINARY;
309 |     } else {
310 |         output.type = FLOAT;
311 |          //_assert(input.type == FLOAT);
312 |     }
313 | 
314 |     return output;
315 | }
316 | 
317 | static __attribute__ ((always_inline))
318 | tensor maxpool(tensor input, uint w, uint h, uint sx, uint sy, void *xor, int *sync)
319 | {
320 |     int out_w, out_h;
321 |     int i, j, k, x, y;
322 | 
323 |     uint8x16_t *in[w * h], *out;
324 |     uint8x16_t *xorp = xor;
325 | 
326 |     out_w = (input.shape[0] - w) / sx + 1;
327 |     out_h = (input.shape[1] - h) / sy + 1;
328 | 
329 |     tensor output = output(input, out_w, out_h, input.shape[2]);
330 | 
331 |     int kk;
332 |     do {
333 |         kk = __sync_fetch_and_add(sync, 1);
334 |         if (kk >= out_w * out_h)
335 |             break;
336 | 
337 |         j = kk % out_h; kk /= out_h;
338 |         i = kk;
339 | 
340 |         if (input.type == BINARY) {
341 | 
342 |         for (x = 0; x < w; x++)
343 |             for (y = 0; y < h; y++)
344 |                 in[x * h + y] = (void*) ptr(input, bool, i * sx + x, j * sy + y, 0);
345 | 
346 |         out = (void*) ptr(output, bool, i, j, 0);
347 |         if (input.shape[2] % 128 == 0) {
348 |         for (k = 0; k < input.shape[2] / 128; k++) {
349 |             out[k] = in[0][k];
350 |             for (x = 1; x < w * h; x++)
351 |                 out[k] &= in[x][k];
352 | 
353 |             if (xorp)
354 |                 out[k] ^= xorp[k];
355 |         }
356 |         } else if (input.shape[2] == 96) {
357 |             uint32_t *ptr = (uint32_t*) in[0];
358 |             uint32x4_t tmp;// = { ptr[0], ptr[1], ptr[2], 0};
359 | 
360 |             tmp = (uint32x4_t) {ptr[0], ptr[1], ptr[2]};
361 |             for (x = 1; x < w * h; x++) {
362 |                 ptr = (uint32_t*) in[x];
363 |                 tmp &= (uint32x4_t) {ptr[0], ptr[1], ptr[2]};
364 |             }
365 |             if (xorp)
366 |                 tmp ^= vreinterpretq_u32_u8(*xorp);
367 | 
368 |             ptr = (uint32_t*) out;
369 |             ptr[0] = tmp[0];
370 |             ptr[1] = tmp[1];
371 |             ptr[2] = tmp[2];
372 |         } else {
373 |             _assert(0);
374 |         }
375 |         } else {
376 |         _assert(input.type == FLOAT);
377 |         float32x4_t *in[w * h], *out;
378 |         for (x = 0; x < w; x++)
379 |             for (y = 0; y < h; y++)
380 |                 in[x * h + y] = (void*) ptr(input, float, i * sx + x, j * sy + y, 0);
381 | 
382 |         _assert(input.shape[2] % 4 == 0);
383 |         out = (void*) ptr(output, float, i, j, 0);
384 |         for (k = 0; k < input.shape[2] / 4; k++) {
385 |             out[k] = in[0][k];
386 |             for (x = 1; x < w * h; x++)
387 |                 out[k] = vmaxq_f32(out[k], in[x][k]);
388 |         }
389 |         }
390 |     } while (1);
391 | 
392 |     return output;
393 | }
394 | 
395 | static __attribute__ ((always_inline))
396 | tensor xnornet_fix(tensor input, uint8_t *xor)
397 | {
398 |     int i, j, k;
399 |     _assert(input.type == BINARY);
400 |     _assert(input.shape[2] % 8 == 0);
401 | 
402 |     tensor output = output(input, input.shape[0], input.shape[1], input.shape[2]);
403 | 
404 |     for (i = 0; i < input.shape[0]; i++) for (j = 0; j < input.shape[1]; j++) {
405 |         for (k = 0; k < input.shape[2]; k += 8)
406 |             *(uint8_t*)ptr(output, bool, i, j, k) =
407 |                 *(uint8_t*)ptr(input, bool, i, j, k) ^ xor[k / 8];
408 |     }
409 |     return output;
410 | }
411 | 
412 | static __attribute__ ((always_inline))
413 | tensor quantize(tensor input, float *qparam, bool uint8, bool need_min)
414 | {
415 |     float min = 0.0f, max = 0.0f, m;
416 |     float *in;
417 |     int i;
418 |     uint8_t *out, tmp;
419 | 
420 |     tensor output = output(input, input.shape[0], input.shape[1], input.shape[2]);
421 |     output.type = INT8;
422 | 
423 |     in = input.data;
424 |     out = output.data;
425 | 
426 |     for (i = 0; i < input.shape[0] * input.shape[1] * input.shape[2]; i++) {
427 |         max = __builtin_fmaxf(max, in[i]);
428 |         if (need_min)
429 |             min = __builtin_fminf(min, in[i]);
430 |     }
431 | 
432 |     qparam[0] = (max - min) / 256.0f;
433 |     m = 256.0f / (max - min);
434 |     qparam[1] = min * m + (uint8 ? 0.0f : 128.0f);
435 | 
436 |     for (i = 0; i < input.shape[0] * input.shape[1] * input.shape[2]; i++) {
437 |         tmp = __builtin_fminf(__builtin_fmaxf((in[i] - (need_min ? min : 0.0f)) * m, 0.0f), 255.0f);
438 |         out[i] = tmp - (uint8 ? 0 : 128);
439 |     }
440 | 
441 |     /*_assert(input.dim == 3);
442 |     _assert(input.type == FLOAT);
443 |     _assert(input.shape[0] * input.shape[1] * input.shape[2] % 16 == 0);
444 | 
445 |     tensor output = output(input, input.shape[0], input.shape[1], input.shape[2]);
446 |     output.type = INT8;
447 | 
448 |     int i;
449 |     float32x4_t *in;//, *out;
450 |     uint8x16_t *out;
451 |     float32x4_t max[4] = {0}, min[4] = {0}, m, b;
452 |     uint32x4_t u32[4];
453 |     uint16x8_t u16[2];
454 |     uint8x16_t u8;
455 |     float x, _min;
456 | 
457 |     //todo: max/min initialation when need_min is true
458 | 
459 |     in = input.data;
460 | 
461 |     for (i = 0; i < input.shape[0] * input.shape[1] * input.shape[2] / 16; i++) {
462 |         max[0] = vmaxq_f32(max[0], in[0]);
463 |         max[1] = vmaxq_f32(max[1], in[1]);
464 |         max[2] = vmaxq_f32(max[2], in[2]);
465 |         max[3] = vmaxq_f32(max[3], in[3]);
466 | 
467 |         if (need_min) {
468 |             min[0] = vminq_f32(min[0], in[0]);
469 |             min[1] = vminq_f32(min[1], in[1]);
470 |             min[2] = vminq_f32(min[2], in[2]);
471 |             min[3] = vminq_f32(min[3], in[3]);
472 |         }
473 | 
474 |         in += 4;
475 |     }
476 | 
477 |     max[0] = vmaxq_f32(max[0], max[1]);
478 |     max[1] = vmaxq_f32(max[2], max[3]);
479 |     max[0] = vmaxq_f32(max[0], max[1]);
480 |     x = fmaxf(fmaxf(max[0][0], max[0][1]), fmaxf(max[0][2], max[0][3]));
481 |     qparam[0] = x / 256.0f;
482 |     m = vdupq_n_f32(256.0f / x);
483 | 
484 |     if (need_min) {
485 |         min[0] = vminq_f32(min[0], min[1]);
486 |         min[1] = vminq_f32(min[2], min[3]);
487 |         min[0] = vminq_f32(min[0], min[1]);
488 |         x = fminf(fminf(min[0][0], min[0][1]), fminf(min[0][2], min[0][3]));
489 |         qparam[1] = x;
490 |         b = vdupq_n_f32(x);
491 |     } else {
492 |         qparam[1] = 0.0f;
493 |     }
494 | 
495 |     in = input.data;
496 |     out = output.data;
497 | 
498 |     for (i = 0; i < input.shape[0] * input.shape[1] * input.shape[2] / 16; i++) {
499 |         if (!need_min) {
500 |             u32[0] = vcvtq_u32_f32(in[0] * m);
501 |             u32[1] = vcvtq_u32_f32(in[1] * m);
502 |             u32[2] = vcvtq_u32_f32(in[2] * m);
503 |             u32[3] = vcvtq_u32_f32(in[3] * m);
504 |         } else {
505 |             u32[0] = vcvtq_u32_f32((in[0] - b) * m);
506 |             u32[1] = vcvtq_u32_f32((in[1] - b) * m);
507 |             u32[2] = vcvtq_u32_f32((in[2] - b) * m);
508 |             u32[3] = vcvtq_u32_f32((in[3] - b) * m);
509 |         }
510 | 
511 |         u16[0] = vcombine_u16(vqmovn_u32(u32[0]), vqmovn_u32(u32[1]));
512 |         u16[1] = vcombine_u16(vqmovn_u32(u32[2]), vqmovn_u32(u32[3]));
513 | 
514 |         u8 = vcombine_u8(vqmovn_u16(u16[0]), vqmovn_u16(u16[1]));
515 |         if (!uint8)
516 |             u8 -= vdupq_n_u8(128);
517 |         *out++ = u8;
518 | 
519 |         in += 4;
520 |     } */
521 | 
522 |     return output;
523 | }
524 | 
525 | #ifdef PRINT_TIME
526 | #include <stdio.h>
527 | #include <time.h>
528 | #define TIME() ({ t1 = get_time(); printf("%f\n", (double) (t1 - t0) / 1000000.0); t0 = t1; })
529 | static uint64_t get_time(void)
530 | {
531 |     struct timespec ts;
532 | 
533 |     clock_gettime(CLOCK_MONOTONIC, &ts);
534 |     return (uint64_t) ts.tv_sec * 1000000000ull + ts.tv_nsec;
535 | }
536 | #else
537 | #define TIME()
538 | #endif
539 | 
540 | #define w_float_bin_float(x, y, z) struct { float w[(x)*(y)]; float b[y*z]; }
541 | #define w_bin(x, y) struct { uint8_t w[(x)*(y)/8]; uint16_t b[y]; }
542 | #define w_bin_float_bin(x, y) struct { uint8_t w[(x)*(y)/8]; float m[y]; float b[y]; }
543 | 
544 | #define w_float(x, y) struct { float w[(x)*(y)]; float b[y]; }
545 | 
546 | #define w_bin_float(x, y) struct { uint8_t w[(x)*(y)/8]; float b[y*2]; }
547 | #define w_int8(x, y) struct { int8_t w[(x)*(y)]; float b[y*4]; }
548 | 
549 | #include <pthread.h>
550 | #ifndef NUM_THREAD
551 | #define NUM_THREAD 1
552 | #endif
553 | struct thread_arg {
554 |     void *in, *weights, *tmp;
555 |     int *sync;
556 |     uint id;
557 |     int *wait_cnt;
558 |     pthread_mutex_t *mutex;
559 |     pthread_cond_t *cond;
560 |     float quant_param[2];
561 | };
562 | 
563 | __attribute__ ((always_inline))
564 | static void wait(struct thread_arg *arg, int i)
565 | {
566 |     pthread_mutex_lock(arg->mutex);
567 |     *arg->wait_cnt += 1;
568 |     if (*arg->wait_cnt < i * NUM_THREAD) {
569 |         pthread_cond_wait(arg->cond, arg->mutex);
570 |     } else {
571 |         *arg->sync = 0;
572 |         pthread_cond_broadcast(arg->cond);
573 |     }
574 |     pthread_mutex_unlock(arg->mutex);
575 | }
576 | 
577 | static void* worker(void *_arg);
578 | 
579 | __attribute__ ((always_inline))
580 | void* FUNCTION_NAME(void *in, void *weights, void *tmp) {
581 |     pthread_t thread[NUM_THREAD];
582 |     void *ret;
583 |     int sync = 0, i, wait_cnt = 0;
584 |     pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
585 |     pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
586 | 
587 |     //_assert(((unsigned long) in & 15) == 0);
588 |     //_assert(((unsigned long) tmp & 15) == 0);
589 |     //_assert(((unsigned long) weights & 15) == 0);
590 | 
591 |     struct thread_arg arg[NUM_THREAD];
592 |     for (i = 0; i < NUM_THREAD; i++) {
593 |         arg[i] = (struct thread_arg) {in, weights, tmp, &sync, i, &wait_cnt, &mutex, &cond};
594 |         pthread_create(&thread[i], 0, worker, &arg[i]);
595 |     }
596 | 
597 |     for (i = 0; i < NUM_THREAD; i++)
598 |          pthread_join(thread[i], &ret);
599 |     return ret;
600 | }
601 | 


--------------------------------------------------------------------------------
/c_ops_neon.h:
--------------------------------------------------------------------------------
  1 | #ifdef __aarch64__
  2 | #define for_each_reg(f) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8) f(9) f(10) f(11) \
  3 |               f(12) f(13) f(14) f(15) f(16) f(17) f(18) f(19) f(20) f(21) f(22) f(23)
  4 | #define for_each_reg2(f) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8) f(9) f(10) f(11) \
  5 |               f(12) f(13) f(14) f(15) f(16) f(17) f(18) f(19) f(20) f(21) f(22) f(23)
  6 | #else
  7 | #define for_each_reg(f) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8)
  8 | #define for_each_reg2(f) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) f(8)
  9 | #endif
 10 | 
 11 | #define touch_reg(x) asm ("" : "+w"(v##x) ::);
 12 | 
 13 | // float
 14 | #ifdef __aarch64__
 15 | // cortexa53 implementation based on google's gemmlowp
 16 | #define reg_use(x) , "+w"(v##x)
 17 | #define float_f(a, w, wn, wnn, x0, off) \
 18 |     "ldr "wnn", [%[w], #("#off"*16)]\n" \
 19 |     "ins "wn".d[1], "x0"\n" \
 20 |     "fmla "a".4s, "w".4s, v31.s[0]\n" \
 21 |     "ldr "x0", [%[w], #("#off"*16+8)]\n"
 22 | 
 23 | #define float_kernel(i, weight, depth) ({ \
 24 |     void *_input = i; \
 25 |     void *_filter = weight; \
 26 |     uint _size = depth; \
 27 |     _assert(_size >= 2); \
 28 |     _size -= 1; \
 29 |     asm volatile ( \
 30 | "ld1 {v24.4s}, [%[w]]\n" \
 31 | "ldr d25, [%[w], #16]\n" \
 32 | "ldr x0, [%[w], #24]\n" \
 33 | "ldr d26, [%[w], #32]\n" \
 34 | "ldr x1, [%[w], #40]\n" \
 35 | "add %[w], %[w], #48\n" \
 36 | "ld1r {v31.4s}, [%[in]], #4\n" \
 37 | "2:\n" \
 38 |     float_f("v0", "v24", "v25", "d27", "x0", 0) \
 39 |     float_f("v1", "v25", "v26", "d24", "x1", 1) \
 40 |     float_f("v2", "v26", "v27", "d25", "x0", 2) \
 41 |     float_f("v3", "v27", "v24", "d26", "x1", 3) \
 42 |     float_f("v4", "v24", "v25", "d27", "x0", 4) \
 43 |     float_f("v5", "v25", "v26", "d24", "x1", 5) \
 44 |     float_f("v6", "v26", "v27", "d25", "x0", 6) \
 45 |     float_f("v7", "v27", "v24", "d26", "x1", 7) \
 46 |     float_f("v8", "v24", "v25", "d27", "x0", 8) \
 47 |     float_f("v9", "v25", "v26", "d24", "x1", 9) \
 48 |     float_f("v10", "v26", "v27", "d25", "x0", 10) \
 49 |     float_f("v11", "v27", "v24", "d26", "x1", 11) \
 50 |     float_f("v12", "v24", "v25", "d27", "x0", 12) \
 51 |     float_f("v13", "v25", "v26", "d24", "x1", 13) \
 52 |     float_f("v14", "v26", "v27", "d25", "x0", 14) \
 53 |     float_f("v15", "v27", "v24", "d26", "x1", 15) \
 54 |     float_f("v16", "v24", "v25", "d27", "x0", 16) \
 55 |     float_f("v17", "v25", "v26", "d24", "x1", 17) \
 56 |     float_f("v18", "v26", "v27", "d25", "x0", 18) \
 57 |     float_f("v19", "v27", "v24", "d26", "x1", 19) \
 58 |     float_f("v20", "v24", "v25", "d27", "x0", 20) \
 59 |     float_f("v21", "v25", "v26", "d24", "x1", 21) \
 60 |     float_f("v22", "v26", "v27", "d25", "x0", 22) \
 61 |     float_f("v23", "v27", "v24", "d26", "x1", 23) \
 62 |     "ld1r {v31.4s}, [%[in]], #4\n" \
 63 |     "add %[w], %[w], #(4*96)\n" \
 64 |     "subs %w[d], %w[d], #1\n" \
 65 |     "bne 2b\n" \
 66 | float_f("v0", "v24", "v25", "d27", "x0", 0) \
 67 | float_f("v1", "v25", "v26", "d24", "x1", 1) \
 68 | float_f("v2", "v26", "v27", "d25", "x0", 2) \
 69 | float_f("v3", "v27", "v24", "d26", "x1", 3) \
 70 | float_f("v4", "v24", "v25", "d27", "x0", 4) \
 71 | float_f("v5", "v25", "v26", "d24", "x1", 5) \
 72 | float_f("v6", "v26", "v27", "d25", "x0", 6) \
 73 | float_f("v7", "v27", "v24", "d26", "x1", 7) \
 74 | float_f("v8", "v24", "v25", "d27", "x0", 8) \
 75 | float_f("v9", "v25", "v26", "d24", "x1", 9) \
 76 | float_f("v10", "v26", "v27", "d25", "x0", 10) \
 77 | float_f("v11", "v27", "v24", "d26", "x1", 11) \
 78 | float_f("v12", "v24", "v25", "d27", "x0", 12) \
 79 | float_f("v13", "v25", "v26", "d24", "x1", 13) \
 80 | float_f("v14", "v26", "v27", "d25", "x0", 14) \
 81 | float_f("v15", "v27", "v24", "d26", "x1", 15) \
 82 | float_f("v16", "v24", "v25", "d27", "x0", 16) \
 83 | float_f("v17", "v25", "v26", "d24", "x1", 17) \
 84 | float_f("v18", "v26", "v27", "d25", "x0", 18) \
 85 | float_f("v19", "v27", "v24", "d26", "x1", 19) \
 86 | float_f("v20", "v24", "v25", "d27", "x0", 20) \
 87 | "ins v26.d[1], x1\n" \
 88 | "fmla v21.4s, v25.4s, v31.s[0]\n" \
 89 | "ins v27.d[1], x0\n" \
 90 | "fmla v22.4s, v26.4s, v31.s[0]\n" \
 91 | "fmla v23.4s, v27.4s, v31.s[0]\n" \
 92 |     : [d] "+r"(_size), [w] "+r"(_filter), [in] "+r"(_input) for_each_reg(reg_use): \
 93 |     : "cc", "x0", "x1", "v24", "v25", "v26", "v27", "v31"); \
 94 | })
 95 | #else
 96 | #define float_op(x) if (x < num_reg) ({ \
 97 |     v##x = vreinterpretq_u8_f32(vmlaq_f32(vreinterpretq_f32_u8(v##x), in0, *_w++)); \
 98 | });
 99 | 
100 | #define float_kernel(in, weight, depth) ({ \
101 |     float *_i = (void*) (in); \
102 |     float32x4_t *_w = (void*) (weight); \
103 |     uint _d = (depth); \
104 |     for (int z = 0; z < _d; z++) { \
105 |         float32x4_t in0 = vdupq_n_f32(_i[z]); \
106 |         for_each_reg(float_op) \
107 |     } \
108 | })
109 | #endif
110 | 
111 | // int8
112 | // could accumulate two 16bit results with a single pairwise accumulate instead of 2x addw
113 | #define int8_op(x) if (x < num_reg) ({ \
114 |     int16x8_t t[4]; \
115 |     t[0] = vmull_s8(vget_low_s8(_w[x /8*4]), in0); \
116 |     t[1] = vmull_s8(vget_high_s8(_w[x /8*4]), in0); \
117 |     t[2] = vmull_s8(vget_low_s8(_w[x /8*4 + 1]), in0); \
118 |     t[3] = vmull_s8(vget_high_s8(_w[x /8*4 + 1]), in0); \
119 |     t[0] = vmlal_s8(t[0], vget_low_s8(_w[x /8*4 + 2]), in1); \
120 |     t[1] = vmlal_s8(t[1], vget_high_s8(_w[x /8*4 + 2]), in1); \
121 |     t[2] = vmlal_s8(t[2], vget_low_s8(_w[x /8*4 + 3]), in1); \
122 |     t[3] = vmlal_s8(t[3], vget_high_s8(_w[x /8*4 + 3]), in1); \
123 |     v##x = vreinterpretq_u8_s32(vaddw_s16(vreinterpretq_s32_u8(v##x), (x & 1) ? vget_high_s16(t[x / 2 & 3]) : vget_low_s16(t[x / 2 & 3]))); \
124 | });
125 | 
126 | #define int8_op2(x) if (x < num_reg) ({ \
127 |     int16x8_t t[2]; \
128 |     t[0] = vmull_s8(vget_low_s8(_w[x / 4]), in0); \
129 |     t[1] = vmull_s8(vget_high_s8(_w[x / 4]), in0); \
130 |     v##x = vreinterpretq_u8_s32(vaddw_s16(vreinterpretq_s32_u8(v##x), (x & 1) ? vget_high_s16(t[x / 2 & 1]) : vget_low_s16(t[x / 2 & 1]))); \
131 | });
132 | 
133 | #define int8_kernel(in, weight, depth) ({ \
134 |     int8_t *_i = (void*) (in); \
135 |     int8x16_t *_w = (void*) (weight); \
136 |     uint _d = (depth); \
137 |     int z; \
138 |     for (z = 0; z < (_d & ~1); z += 2) { \
139 |         int8x8_t in0 = vdup_n_s8(_i[z]); \
140 |         int8x8_t in1 = vdup_n_s8(_i[z+1]); \
141 |         for_each_reg(int8_op); \
142 |         _w += num_reg / 2; \
143 |     } \
144 |     for (; z < _d; z++) { \
145 |         int8x8_t in0 = vdup_n_s8(_i[z]); \
146 |         for_each_reg(int8_op2); \
147 |         _w += num_reg / 4; \
148 |     } \
149 | })
150 | 
151 | // XOR
152 | // TODO
153 | #define bin_op(x) if (x < num_reg) ({ \
154 |     uint8x16_t tmp = vcntq_u8(_w[x / 2] ^ in0); \
155 |     v##x = vreinterpretq_u8_u16(vaddw_u8(vreinterpretq_u16_u8(v##x), (x & 1) ? vget_high_u8(tmp) : vget_low_u8(tmp))); \
156 | });
157 | 
158 | #define bin_kernel(in, weight, depth) ({ \
159 |     uint8_t *_i = (void*) (in); \
160 |     uint8x16_t *_w = (void*) (weight); \
161 |     uint _d = (depth); \
162 |     int z; \
163 |     for (z = 0; z < _d / 8; z++) { \
164 |         uint8x16_t in0 = vdupq_n_u8(_i[z]); \
165 |         for_each_reg(bin_op); \
166 |         _w += num_reg / 2; \
167 |     } \
168 | })
169 | 
170 | // float-binary
171 | // debinarization is costly
172 | #define float_bin_op(x) if (x < num_reg) ({ \
173 |     float32x4_t sel; \
174 |     uint32x4_t tmp[4]; \
175 |     tmp[0] = vtstq_u32(vdupq_n_u32(_w[x / 4]), mask0); \
176 |     tmp[1] = vtstq_u32(vdupq_n_u32(_w[x / 4]), mask1); \
177 |     tmp[2] = vtstq_u32(vdupq_n_u32(_w[x / 4]), mask2); \
178 |     tmp[3] = vtstq_u32(vdupq_n_u32(_w[x / 4]), mask3); \
179 |     sel = vbslq_f32(tmp[x % 4], in1, in0); \
180 |     v##x = vreinterpretq_u8_f32(vaddq_f32(vreinterpretq_f32_u8(v##x), sel)); \
181 | });
182 | 
183 | #define float_bin_kernel(in, weight, depth) ({ \
184 |     float *_i = (void*) (in); \
185 |     uint16_t *_w = (void*) (weight); \
186 |     uint32x4_t mask0 = {128, 64, 32, 16}; \
187 |     uint32x4_t mask1 = {8, 4, 2, 1}; \
188 |     uint32x4_t mask2 = {32768, 16384, 8192, 4096}; \
189 |     uint32x4_t mask3 = {2048, 1024, 512, 256}; \
190 |     uint _d = (depth); \
191 |     int z; \
192 |     for (z = 0; z < _d; z++) { \
193 |         float32x4_t in0 = vdupq_n_f32(_i[z]); \
194 |         float32x4_t in1 = vdupq_n_f32(-_i[z]); \
195 |         for_each_reg(float_bin_op); \
196 |         _w += num_reg / 4; \
197 |     } \
198 | })
199 | 
200 | 
201 | // uint8-binary
202 | // 1. debinarize into 8-bit masks with cmtst
203 | // 2. multiply by 0/1 using AND instruction (the result must be corrected later)
204 | // 3. accumulate pairwise into 16-bit accumulators
205 | // -accumulate up to 256 results into 16-bit registers then add to 32-bit register
206 | // -in some cases only 16-bit accumulators would be needed..
207 | // -clang likes to replace cmtst instruction with slower sequences...
208 | //
209 | #ifdef __aarch64__
210 | #define int8_bin_op(x) if (x < num_reg / 2) ({ \
211 |     uint8x16_t m = vreinterpretq_u8_u16(vdupq_n_u16(_w[x])); \
212 |     asm ("cmtst %[a].16b, %[a].16b, %[b].16b\n" : [a] "+w"(m), [b] "+w"(mask) ::); \
213 |     tmp##x = vpadalq_u8(tmp##x, m & in0); \
214 | });
215 | #else
216 | #define int8_bin_op(x) if (x < num_reg / 2) ({ \
217 |     uint8x16_t m = vreinterpretq_u8_u16(vdupq_n_u16(_w[x])); \
218 |     m = vtstq_u8(m, mask); \
219 |     tmp##x = vpadalq_u8(tmp##x, m & in0); \
220 | });
221 | #endif
222 | 
223 | #define readtmp(x) if (_x == x) r = tmp##x;
224 | #define deftmp(x) uint16x8_t tmp##x = {};
225 | 
226 | #define int8_bin_op2(x) if (x < num_reg) ({ \
227 |     uint16x8_t t = ({ int _x = (x/2); uint16x8_t r; for_each_reg2(readtmp); r; }); \
228 |     v##x = vreinterpretq_u8_u32(vaddw_u16(vreinterpretq_u32_u8(v##x), \
229 |            (x & 1) ? vget_high_u16(t) : vget_low_u16(t))); \
230 | });
231 | 
232 | #define int8_bin_kernel(in, weight, depth) ({ \
233 |     uint16_t *_i = (void*) (in); \
234 |     uint16_t *_w = (void*) (weight); \
235 |     uint8x16_t mask = {128, 128, 64, 64, 32, 32, 16, 16, 8, 8, 4, 4, 2, 2, 1, 1}; \
236 |     uint _d = (depth); \
237 |     int z, i; \
238 |     for (z = 0; z < _d / 2; ) { \
239 |         for_each_reg(deftmp); \
240 |         for (i = 0; i < 128 && z < _d / 2; i++, z++) { \
241 |             uint8x16_t in0 = vreinterpretq_u8_u16(vdupq_n_u16(_i[z])); \
242 |             for_each_reg(int8_bin_op); \
243 |             _w += num_reg / 2; \
244 |         } \
245 |         for_each_reg(int8_bin_op2); \
246 |     } \
247 | })
248 | 
249 | #ifndef __aarch64__
250 | #define vpaddq_u8(a, b) vcombine_u8( \
251 |     vpadd_u8(vget_low_u8(a), vget_high_u8(a)), \
252 |     vpadd_u8(vget_low_u8(b), vget_high_u8(b)));
253 | #endif
254 | 
255 | __attribute__ ((always_inline))
256 | static void binarize(uint32_t *output, uint8x16_t *buf_u8, uint size)
257 | {
258 |     uint8x16_t mask = {128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1};
259 |     uint k;
260 | 
261 |     _assert(size % 2 == 0);
262 | 
263 |     for (k = 0; k < size; k++)
264 |         buf_u8[k] &= mask;
265 | 
266 |     for (k = 0; k < size / 2; k++)
267 |         buf_u8[k] = vpaddq_u8(buf_u8[k*2], buf_u8[k*2+1]);
268 | 
269 |     for (k = 0; k < (size + 3) / 4; k++)
270 |         buf_u8[k] = vpaddq_u8(buf_u8[k*2], buf_u8[k*2+1]);
271 | 
272 |     for (k = 0; k < (size + 7) / 8; k++)
273 |         buf_u8[k] = vpaddq_u8(buf_u8[k*2], buf_u8[k*2+1]);
274 | 
275 |     for (k = 0; k < size / 2; k++)
276 |         output[k] = vreinterpretq_u32_u8(buf_u8[k / 4])[k % 4];
277 | }
278 | 


--------------------------------------------------------------------------------
/cifar_bnn.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import numpy as np
  3 | import time
  4 | import os.path
  5 | 
  6 | import bnn
  7 | #import tf_export
  8 | 
  9 | def dense_to_one_hot(labels_dense, num_classes):
 10 |   """Convert class labels from scalars to one-hot vectors."""
 11 |   num_labels = labels_dense.shape[0]
 12 |   index_offset = np.arange(num_labels) * num_classes
 13 |   labels_one_hot = np.zeros((num_labels, num_classes))
 14 |   labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
 15 |   return labels_one_hot
 16 | 
 17 | def cifar10():
 18 |     batch = []
 19 |     labels = []
 20 |     size = 32*32*3+1
 21 |     for path in ['data_batch_%i' % (i + 1) for i in range(5)] + ['test_batch']:
 22 |         d = open('data/cifar-10-batches-bin/' + path + '.bin', 'rb').read()
 23 |         assert(len(d) % size == 0)
 24 |         for i in range(0, len(d), size):
 25 |             e = d[i:i+size]
 26 |             labels += [e[0]]
 27 |             batch += [np.frombuffer(e[1:], dtype=np.uint8)]
 28 | 
 29 |     data = np.concatenate(batch)
 30 |     data = data.astype(np.float32)
 31 |     data = np.multiply(data, 2.0 / 255.0)
 32 |     data = np.add(data, -1.0)
 33 | 
 34 |     data = np.reshape(data, (-1, 3, 32, 32))
 35 |     data = np.transpose(data, (0, 2, 3, 1))
 36 |     data = np.reshape(data, (-1, 32*32*3))
 37 | 
 38 |     label = dense_to_one_hot(np.asarray(labels), 10)
 39 |     return data[:50000], label[:50000], data[50000:], label[50000:]
 40 | 
 41 | 
 42 | train_x, train_y, test_x, test_y = cifar10()
 43 | 
 44 | x0 = tf.placeholder(tf.float32, [None, 32*32*3])
 45 | y0 = tf.placeholder(tf.float32, [None, 10])
 46 | train = tf.placeholder(tf.bool, name='is_training')
 47 | lr = tf.placeholder(tf.float32)
 48 | 
 49 | # convolutions
 50 | x = tf.reshape(x0, [-1, 32, 32, 3])
 51 | x = bnn.layer(x, 128, filter_size=[3, 3])
 52 | x = bnn.layer(x, 128, filter_size=[3, 3], pool=([2, 2], [2, 2]))
 53 | x = bnn.layer(x, 256, filter_size=[3, 3])
 54 | x = bnn.layer(x, 256, filter_size=[3, 3], pool=([2, 2], [2, 2]))
 55 | x = bnn.layer(x, 512, filter_size=[3, 3])
 56 | x = bnn.layer(x, 512, filter_size=[3, 3], pool=([2, 2], [2, 2]))
 57 | 
 58 | # fully connected
 59 | x = bnn.layer(x, 1024, filter_size=[4, 4], padding='VALID')
 60 | x = bnn.layer(x, 1024)
 61 | x = bnn.layer(x, 10, activate='none')
 62 | _y = tf.identity(x)
 63 | y = tf.reshape(_y, [-1, 10])
 64 | 
 65 | loss = tf.reduce_mean(tf.square(tf.losses.hinge_loss(y0, y)))
 66 | train_step = tf.train.AdamOptimizer(lr).minimize(loss)
 67 | 
 68 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
 69 | update_op = tf.group(*update_ops)
 70 | 
 71 | correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y0,1))
 72 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
 73 | 
 74 | saver = tf.train.Saver()
 75 | 
 76 | gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
 77 | 
 78 | with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
 79 |     sess.run(tf.global_variables_initializer())
 80 | 
 81 |     if os.path.isfile("modelcifarx.ckpt.meta"):
 82 |         print('loading model')
 83 |         saver.restore(sess, "modelcifarx.ckpt")
 84 | 
 85 |     #bnn.export(y, x0, (1, 3072), 2.0 / 255.0, -1.0, 'cifar')
 86 | 
 87 |     #yy = sess.run([y], feed_dict={x0: test_x[0:1], y0: test_y[0:1], train : False})
 88 |     #print(yy)
 89 | 
 90 |     print('training...')
 91 | 
 92 | #Training
 93 |     EPOCHS = 500
 94 |     BATCH_SIZE = 50
 95 |     LR = 0.001
 96 |     LR_DECAY = (0.0000003/LR)**(1.0/EPOCHS)
 97 | 
 98 |     print(LR_DECAY)
 99 | 
100 |     num_batch = len(train_x) / BATCH_SIZE
101 | 
102 |     from sklearn.utils import shuffle
103 | 
104 |     for i in range(EPOCHS):
105 |         tx, ty = shuffle(train_x, train_y)
106 |         total_loss = 0.0
107 | 
108 |         t0 = time.perf_counter()
109 | 
110 |         for off in range(0, len(train_x), BATCH_SIZE):
111 |             end = off + BATCH_SIZE
112 |             x, y = tx[off:end], ty[off:end]
113 | 
114 |             _, l = sess.run([train_step, loss], feed_dict={x0: x, y0: y, train : True, lr : LR})
115 |             total_loss += l
116 | 
117 |             sess.run(update_op)
118 | 
119 |         t1 = time.perf_counter()
120 | 
121 |         total_loss /= num_batch
122 |         LR *= LR_DECAY
123 | 
124 |         # split in batches of 100 because memory
125 |         ac2 = 0.0
126 |         l2 = 0.0
127 |         for j in range(0, 10000, 100):
128 |             ac, l = sess.run([accuracy, loss], feed_dict={x0: test_x[j:j+100], y0: test_y[j:j+100], train : False})
129 |             ac2 += ac
130 |             l2 += l
131 |         ac2 /= 100.0
132 |         l2 /= 100.0
133 |         print("epoch %i: accuracy=%f,%f loss=%f time=%f" % (i, ac2, l2, total_loss, t1 - t0))
134 |         save_path = saver.save(sess, "modelcifarx.ckpt")
135 | 
136 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
 1 | # Running neural networks on embedded systems
 2 | 
 3 | ## AlexNet
 4 | 
 5 | *AlexNet* is a good example of what a real neural network may look like. It has a raw floating point weight size of 238 MiB, and the size can be much larger if using a *tensorflow* checkpoint. The weights can be compressed, but neural network weights typically have high entropy and do not compress (losslessly) very well. If we wanted to build an embedded or mobile application with it, this would mean 238 MiB of storage and memory used for only the weights and the application would need a few seconds to load the weights from storage into memory. The mobile application would also require a 238 MiB download every time the weights need to be updated. For embedded systems, such a weight size can simply make it impossible to use the neural network and for a mobile application it would be pushing the limits of reasonable user expectations.
 6 | 
 7 | As for inference time, inference on a single image in *AlexNet* requires over 2 billion floating point operations (1 multiply-add operation counted as 2 operations). In this case a typical smartphone CPU can achieve this at a real-time rate. In fact, my implementation takes around 1 second on a single Nexus 5 core, a 2013 phone. Reducing the amount of work required to perform inference is nevertheless interesting: reducing battery usage, allowing applications that require a more strict definition of real-time and allowing larger models.
 8 | 
 9 | ## Binarized AlexNet
10 | 
11 | One method of reducing the weight size and inference time is binarization. Weights can be binarized to contain only sign information, reducing the weight size by a factor of 32. While it may seem like binarizing weights would reduce accuracy drastically, a basic training example using LeNet-5 on MNIST shows that binarizing weights can actually improve accuracy due to better generalization. In addition to binarizing weights, it is possible to use binary activation functions which have a binary output of -1 or +1. This doesn't reduce weight size and reduces accuracy, but it allows for layers with both binary inputs and binary weights, in which case it is possible to use XOR and bitcount operations for the convolution.
12 | 
13 | [XNOR-Net](https://github.com/allenai/XNOR-Net) provides two pre-trained binarized variations of *AlexNet*, in the form of *Torch* models, which have a size of 476 MiB each and store weights as floating point values. To use these models in *tensorflow*, I first reimplemented both models in *tensorflow* and imported the weights using *pytorch*. The *tensorflow* models however still use floating point values to store the weights and do not have any support for fast binary convolutions. To run the model using the binarized weights I first created my own implementation of the operations required to run these two models in C. Then, I wrote a script which parses the *tensorflow* graph and generates C code which implements the forward pass calling the C functions I implemented.
14 | 
15 | The first variation, *BWN*, has binarized weights for all intermediary layers. This introduces layers which have floating point inputs and binary weights. The second variation, *XNOR-Net*, also features binary activations. This introduces layers which have both binary inputs and binary weights.
16 | 
17 | ## Adding quantization
18 | 
19 | Quantization is another method for reducing weight size and improving inference time, and it is already [possible with tensorflow](https://www.tensorflow.org/performance/quantization). However, for this project I am mostly interested in the case where it is combined with binarization. In both **XNOR-Net** models, the first and last layers aren't binarized, which makes them available for quantization. The first layer accounts for a large portion of the total inference time (around half for the *XNOR-Net* model), while the last layer accounts for 16 MiB of the total 23 MiB weight size. By applying quantization to the first and last layers, the weight size is reduced to 11 MiB and the inference time is lowered. Quantization introduces layers with 8-bit input and 8-bit weights.
20 | 
21 | For the *BWN* model, quantization is also possible for the intermediary layers, where the weights are already binarized but the input values are floating point values. This allows for 8-bit input with 1-bit weight convolutions, which run faster with a negligible difference in the network's output. This strikes a balance of speed and accuracy, somewhere in between float-binary operations and binary-binary operations.
22 | 
23 | ## Android app
24 | 
25 | To showcase my work, I created an Android App which runs inference on camera input using the selected binarized variation of *AlexNet*. The application also allows capturing clips and playing them back, running inference again on the recorded clip.
26 | 
27 | ![alt](http://i.imgur.com/KrW94y0.jpg)
28 | 
29 | 


--------------------------------------------------------------------------------
/rpi-demo/names2.h:
--------------------------------------------------------------------------------
   1 | const char *names[] = {
   2 | [0] = "grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus",
   3 | [1] = "cricket",
   4 | [2] = "ping-pong ball",
   5 | [3] = "beer bottle",
   6 | [4] = "baseball",
   7 | [5] = "combination lock",
   8 | [6] = "comic book",
   9 | [7] = "Lhasa, Lhasa apso",
  10 | [8] = "rhinoceros beetle",
  11 | [9] = "dome",
  12 | [10] = "tricycle, trike, velocipede",
  13 | [11] = "marimba, xylophone",
  14 | [12] = "peacock",
  15 | [13] = "guinea pig, Cavia cobaya",
  16 | [14] = "assault rifle, assault gun",
  17 | [15] = "lakeside, lakeshore",
  18 | [16] = "balloon",
  19 | [17] = "tiger beetle",
  20 | [18] = "trimaran",
  21 | [19] = "packet",
  22 | [20] = "toaster",
  23 | [21] = "whiskey jug",
  24 | [22] = "agaric",
  25 | [23] = "artichoke, globe artichoke",
  26 | [24] = "magnetic compass",
  27 | [25] = "fire screen, fireguard",
  28 | [26] = "binoculars, field glasses, opera glasses",
  29 | [27] = "ambulance",
  30 | [28] = "pirate, pirate ship",
  31 | [29] = "envelope",
  32 | [30] = "Afghan hound, Afghan",
  33 | [31] = "otter",
  34 | [32] = "acorn",
  35 | [33] = "crib, cot",
  36 | [34] = "barn",
  37 | [35] = "beaver",
  38 | [36] = "hip, rose hip, rosehip",
  39 | [37] = "backpack, back pack, knapsack, packsack, rucksack, haversack",
  40 | [38] = "cliff, drop, drop-off",
  41 | [39] = "tiger, Panthera tigris",
  42 | [40] = "redshank, Tringa totanus",
  43 | [41] = "wreck",
  44 | [42] = "wallaby, brush kangaroo",
  45 | [43] = "tray",
  46 | [44] = "Newfoundland, Newfoundland dog",
  47 | [45] = "bib",
  48 | [46] = "bee eater",
  49 | [47] = "projectile, missile",
  50 | [48] = "puck, hockey puck",
  51 | [49] = "dough",
  52 | [50] = "shopping cart",
  53 | [51] = "bucket, pail",
  54 | [52] = "pillow",
  55 | [53] = "bison",
  56 | [54] = "vault",
  57 | [55] = "sorrel",
  58 | [56] = "window shade",
  59 | [57] = "cassette",
  60 | [58] = "trifle",
  61 | [59] = "EntleBucher",
  62 | [60] = "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca",
  63 | [61] = "African elephant, Loxodonta africana",
  64 | [62] = "tractor",
  65 | [63] = "standard schnauzer",
  66 | [64] = "isopod",
  67 | [65] = "siamang, Hylobates syndactylus, Symphalangus syndactylus",
  68 | [66] = "freight car",
  69 | [67] = "oxcart",
  70 | [68] = "ski mask",
  71 | [69] = "redbone",
  72 | [70] = "parachute, chute",
  73 | [71] = "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa",
  74 | [72] = "boathouse",
  75 | [73] = "pug, pug-dog",
  76 | [74] = "web site, website, internet site, site",
  77 | [75] = "red-breasted merganser, Mergus serrator",
  78 | [76] = "prairie chicken, prairie grouse, prairie fowl",
  79 | [77] = "bookshop, bookstore, bookstall",
  80 | [78] = "boxer",
  81 | [79] = "piggy bank, penny bank",
  82 | [80] = "submarine, pigboat, sub, U-boat",
  83 | [81] = "drum, membranophone, tympan",
  84 | [82] = "bakery, bakeshop, bakehouse",
  85 | [83] = "butternut squash",
  86 | [84] = "black and gold garden spider, Argiope aurantia",
  87 | [85] = "bow",
  88 | [86] = "shovel",
  89 | [87] = "guacamole",
  90 | [88] = "letter opener, paper knife, paperknife",
  91 | [89] = "sundial",
  92 | [90] = "plate rack",
  93 | [91] = "maillot, tank suit",
  94 | [92] = "cheetah, chetah, Acinonyx jubatus",
  95 | [93] = "laptop, laptop computer",
  96 | [94] = "Eskimo dog, husky",
  97 | [95] = "tiger shark, Galeocerdo cuvieri",
  98 | [96] = "cock",
  99 | [97] = "dingo, warrigal, warragal, Canis dingo",
 100 | [98] = "velvet",
 101 | [99] = "rule, ruler",
 102 | [100] = "pot, flowerpot",
 103 | [101] = "hoopskirt, crinoline",
 104 | [102] = "vizsla, Hungarian pointer",
 105 | [103] = "harvester, reaper",
 106 | [104] = "flat-coated retriever",
 107 | [105] = "daisy",
 108 | [106] = "Brittany spaniel",
 109 | [107] = "rubber eraser, rubber, pencil eraser",
 110 | [108] = "long-horned beetle, longicorn, longicorn beetle",
 111 | [109] = "slug",
 112 | [110] = "photocopier",
 113 | [111] = "electric guitar",
 114 | [112] = "pinwheel",
 115 | [113] = "meat loaf, meatloaf",
 116 | [114] = "digital clock",
 117 | [115] = "bookcase",
 118 | [116] = "reel",
 119 | [117] = "loupe, jeweler's loupe",
 120 | [118] = "wombat",
 121 | [119] = "Siamese cat, Siamese",
 122 | [120] = "frying pan, frypan, skillet",
 123 | [121] = "china cabinet, china closet",
 124 | [122] = "refrigerator, icebox",
 125 | [123] = "gasmask, respirator, gas helmet",
 126 | [124] = "obelisk",
 127 | [125] = "American coot, marsh hen, mud hen, water hen, Fulica americana",
 128 | [126] = "lotion",
 129 | [127] = "triumphal arch",
 130 | [128] = "thresher, thrasher, threshing machine",
 131 | [129] = "chime, bell, gong",
 132 | [130] = "strawberry",
 133 | [131] = "Bernese mountain dog",
 134 | [132] = "Tibetan mastiff",
 135 | [133] = "spider monkey, Ateles geoffroyi",
 136 | [134] = "porcupine, hedgehog",
 137 | [135] = "aircraft carrier, carrier, flattop, attack aircraft carrier",
 138 | [136] = "bell pepper",
 139 | [137] = "tank, army tank, armored combat vehicle, armoured combat vehicle",
 140 | [138] = "snorkel",
 141 | [139] = "lacewing, lacewing fly",
 142 | [140] = "streetcar, tram, tramcar, trolley, trolley car",
 143 | [141] = "water buffalo, water ox, Asiatic buffalo, Bubalus bubalis",
 144 | [142] = "mink",
 145 | [143] = "Saint Bernard, St Bernard",
 146 | [144] = "red wine",
 147 | [145] = "Indian cobra, Naja naja",
 148 | [146] = "suit, suit of clothes",
 149 | [147] = "bullfrog, Rana catesbeiana",
 150 | [148] = "trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi",
 151 | [149] = "sombrero",
 152 | [150] = "beacon, lighthouse, beacon light, pharos",
 153 | [151] = "greenhouse, nursery, glasshouse",
 154 | [152] = "Pomeranian",
 155 | [153] = "spaghetti squash",
 156 | [154] = "yurt",
 157 | [155] = "ice lolly, lolly, lollipop, popsicle",
 158 | [156] = "military uniform",
 159 | [157] = "wolf spider, hunting spider",
 160 | [158] = "yawl",
 161 | [159] = "loggerhead, loggerhead turtle, Caretta caretta",
 162 | [160] = "bee",
 163 | [161] = "sunscreen, sunblock, sun blocker",
 164 | [162] = "panpipe, pandean pipe, syrinx",
 165 | [163] = "church, church building",
 166 | [164] = "baboon",
 167 | [165] = "little blue heron, Egretta caerulea",
 168 | [166] = "coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch",
 169 | [167] = "corn",
 170 | [168] = "digital watch",
 171 | [169] = "jinrikisha, ricksha, rickshaw",
 172 | [170] = "scabbard",
 173 | [171] = "crutch",
 174 | [172] = "Persian cat",
 175 | [173] = "dock, dockage, docking facility",
 176 | [174] = "robin, American robin, Turdus migratorius",
 177 | [175] = "screen, CRT screen",
 178 | [176] = "stretcher",
 179 | [177] = "lab coat, laboratory coat",
 180 | [178] = "malamute, malemute, Alaskan malamute",
 181 | [179] = "vending machine",
 182 | [180] = "African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus",
 183 | [181] = "chainlink fence",
 184 | [182] = "hourglass",
 185 | [183] = "gown",
 186 | [184] = "toilet seat",
 187 | [185] = "steam locomotive",
 188 | [186] = "cleaver, meat cleaver, chopper",
 189 | [187] = "ram, tup",
 190 | [188] = "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens",
 191 | [189] = "pomegranate",
 192 | [190] = "Madagascar cat, ring-tailed lemur, Lemur catta",
 193 | [191] = "gorilla, Gorilla gorilla",
 194 | [192] = "Ibizan hound, Ibizan Podenco",
 195 | [193] = "coyote, prairie wolf, brush wolf, Canis latrans",
 196 | [194] = "microphone, mike",
 197 | [195] = "safe",
 198 | [196] = "sweatshirt",
 199 | [197] = "barrow, garden cart, lawn cart, wheelbarrow",
 200 | [198] = "bittern",
 201 | [199] = "jaguar, panther, Panthera onca, Felis onca",
 202 | [200] = "chain saw, chainsaw",
 203 | [201] = "Irish water spaniel",
 204 | [202] = "chocolate sauce, chocolate syrup",
 205 | [203] = "pedestal, plinth, footstall",
 206 | [204] = "go-kart",
 207 | [205] = "dining table, board",
 208 | [206] = "water bottle",
 209 | [207] = "barbershop",
 210 | [208] = "croquet ball",
 211 | [209] = "plow, plough",
 212 | [210] = "shopping basket",
 213 | [211] = "mosquito net",
 214 | [212] = "airliner",
 215 | [213] = "palace",
 216 | [214] = "rock python, rock snake, Python sebae",
 217 | [215] = "Indian elephant, Elephas maximus",
 218 | [216] = "consomme",
 219 | [217] = "drake",
 220 | [218] = "gas pump, gasoline pump, petrol pump, island dispenser",
 221 | [219] = "chiton, coat-of-mail shell, sea cradle, polyplacophore",
 222 | [220] = "bell cote, bell cot",
 223 | [221] = "malinois",
 224 | [222] = "banded gecko",
 225 | [223] = "wok",
 226 | [224] = "sea slug, nudibranch",
 227 | [225] = "tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui",
 228 | [226] = "American egret, great white heron, Egretta albus",
 229 | [227] = "Bedlington terrier",
 230 | [228] = "dung beetle",
 231 | [229] = "paddlewheel, paddle wheel",
 232 | [230] = "wing",
 233 | [231] = "toilet tissue, toilet paper, bathroom tissue",
 234 | [232] = "cinema, movie theater, movie theatre, movie house, picture palace",
 235 | [233] = "Airedale, Airedale terrier",
 236 | [234] = "CD player",
 237 | [235] = "half track",
 238 | [236] = "limousine, limo",
 239 | [237] = "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",
 240 | [238] = "bulbul",
 241 | [239] = "Welsh springer spaniel",
 242 | [240] = "fur coat",
 243 | [241] = "promontory, headland, head, foreland",
 244 | [242] = "library",
 245 | [243] = "feather boa, boa",
 246 | [244] = "swing",
 247 | [245] = "radio, wireless",
 248 | [246] = "European gallinule, Porphyrio porphyrio",
 249 | [247] = "bolo tie, bolo, bola tie, bola",
 250 | [248] = "honeycomb",
 251 | [249] = "cowboy boot",
 252 | [250] = "tusker",
 253 | [251] = "dam, dike, dyke",
 254 | [252] = "horizontal bar, high bar",
 255 | [253] = "whiptail, whiptail lizard",
 256 | [254] = "great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias",
 257 | [255] = "chest",
 258 | [256] = "lorikeet",
 259 | [257] = "cassette player",
 260 | [258] = "orange",
 261 | [259] = "rock beauty, Holocanthus tricolor",
 262 | [260] = "carton",
 263 | [261] = "pineapple, ananas",
 264 | [262] = "triceratops",
 265 | [263] = "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
 266 | [264] = "parking meter",
 267 | [265] = "grasshopper, hopper",
 268 | [266] = "joystick",
 269 | [267] = "French loaf",
 270 | [268] = "acoustic guitar",
 271 | [269] = "academic gown, academic robe, judge's robe",
 272 | [270] = "meerkat, mierkat",
 273 | [271] = "drilling platform, offshore rig",
 274 | [272] = "lampshade, lamp shade",
 275 | [273] = "coil, spiral, volute, whorl, helix",
 276 | [274] = "gar, garfish, garpike, billfish, Lepisosteus osseus",
 277 | [275] = "German shepherd, German shepherd dog, German police dog, alsatian",
 278 | [276] = "volleyball",
 279 | [277] = "iPod",
 280 | [278] = "harp",
 281 | [279] = "Tibetan terrier, chrysanthemum dog",
 282 | [280] = "cheeseburger",
 283 | [281] = "harmonica, mouth organ, harp, mouth harp",
 284 | [282] = "Crock Pot",
 285 | [283] = "sea urchin",
 286 | [284] = "volcano",
 287 | [285] = "dogsled, dog sled, dog sleigh",
 288 | [286] = "basenji",
 289 | [287] = "schooner",
 290 | [288] = "Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis",
 291 | [289] = "washbasin, handbasin, washbowl, lavabo, wash-hand basin",
 292 | [290] = "binder, ring-binder",
 293 | [291] = "space heater",
 294 | [292] = "grocery store, grocery, food market, market",
 295 | [293] = "hummingbird",
 296 | [294] = "sock",
 297 | [295] = "scorpion",
 298 | [296] = "miniature pinscher",
 299 | [297] = "Irish wolfhound",
 300 | [298] = "chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour",
 301 | [299] = "jay",
 302 | [300] = "sandbar, sand bar",
 303 | [301] = "canoe",
 304 | [302] = "tree frog, tree-frog",
 305 | [303] = "water jug",
 306 | [304] = "thatch, thatched roof",
 307 | [305] = "vestment",
 308 | [306] = "cucumber, cuke",
 309 | [307] = "banana",
 310 | [308] = "teapot",
 311 | [309] = "badger",
 312 | [310] = "carpenter's kit, tool kit",
 313 | [311] = "Appenzeller",
 314 | [312] = "flute, transverse flute",
 315 | [313] = "school bus",
 316 | [314] = "paper towel",
 317 | [315] = "stove",
 318 | [316] = "hen",
 319 | [317] = "sleeping bag",
 320 | [318] = "fly",
 321 | [319] = "Pekinese, Pekingese, Peke",
 322 | [320] = "bathing cap, swimming cap",
 323 | [321] = "hand blower, blow dryer, blow drier, hair dryer, hair drier",
 324 | [322] = "thunder snake, worm snake, Carphophis amoenus",
 325 | [323] = "snowplow, snowplough",
 326 | [324] = "broccoli",
 327 | [325] = "bull mastiff",
 328 | [326] = "slot, one-armed bandit",
 329 | [327] = "spatula",
 330 | [328] = "flatworm, platyhelminth",
 331 | [329] = "wild boar, boar, Sus scrofa",
 332 | [330] = "monastery",
 333 | [331] = "swimming trunks, bathing trunks",
 334 | [332] = "spindle",
 335 | [333] = "fireboat",
 336 | [334] = "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
 337 | [335] = "buckeye, horse chestnut, conker",
 338 | [336] = "green snake, grass snake",
 339 | [337] = "moving van",
 340 | [338] = "liner, ocean liner",
 341 | [339] = "bloodhound, sleuthhound",
 342 | [340] = "muzzle",
 343 | [341] = "pole",
 344 | [342] = "forklift",
 345 | [343] = "beagle",
 346 | [344] = "toy terrier",
 347 | [345] = "wood rabbit, cottontail, cottontail rabbit",
 348 | [346] = "folding chair",
 349 | [347] = "curly-coated retriever",
 350 | [348] = "amphibian, amphibious vehicle",
 351 | [349] = "tiger cat",
 352 | [350] = "oboe, hautboy, hautbois",
 353 | [351] = "ground beetle, carabid beetle",
 354 | [352] = "viaduct",
 355 | [353] = "modem",
 356 | [354] = "scale, weighing machine",
 357 | [355] = "indigo bunting, indigo finch, indigo bird, Passerina cyanea",
 358 | [356] = "otterhound, otter hound",
 359 | [357] = "agama",
 360 | [358] = "notebook, notebook computer",
 361 | [359] = "rifle",
 362 | [360] = "unicycle, monocycle",
 363 | [361] = "marmot",
 364 | [362] = "English springer, English springer spaniel",
 365 | [363] = "picket fence, paling",
 366 | [364] = "black stork, Ciconia nigra",
 367 | [365] = "handkerchief, hankie, hanky, hankey",
 368 | [366] = "shower cap",
 369 | [367] = "zebra",
 370 | [368] = "Sussex spaniel",
 371 | [369] = "carbonara",
 372 | [370] = "vase",
 373 | [371] = "stole",
 374 | [372] = "upright, upright piano",
 375 | [373] = "bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis",
 376 | [374] = "lens cap, lens cover",
 377 | [375] = "barber chair",
 378 | [376] = "Samoyed, Samoyede",
 379 | [377] = "koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus",
 380 | [378] = "overskirt",
 381 | [379] = "bustard",
 382 | [380] = "Yorkshire terrier",
 383 | [381] = "American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier",
 384 | [382] = "thimble",
 385 | [383] = "four-poster",
 386 | [384] = "keeshond",
 387 | [385] = "ice cream, icecream",
 388 | [386] = "groom, bridegroom",
 389 | [387] = "indri, indris, Indri indri, Indri brevicaudatus",
 390 | [388] = "black swan, Cygnus atratus",
 391 | [389] = "face powder",
 392 | [390] = "hamper",
 393 | [391] = "hornbill",
 394 | [392] = "Border collie",
 395 | [393] = "goblet",
 396 | [394] = "goldfish, Carassius auratus",
 397 | [395] = "sea cucumber, holothurian",
 398 | [396] = "ear, spike, capitulum",
 399 | [397] = "giant schnauzer",
 400 | [398] = "vine snake",
 401 | [399] = "printer",
 402 | [400] = "colobus, colobus monkey",
 403 | [401] = "groenendael",
 404 | [402] = "football helmet",
 405 | [403] = "soap dispenser",
 406 | [404] = "puffer, pufferfish, blowfish, globefish",
 407 | [405] = "recreational vehicle, RV, R.V.",
 408 | [406] = "snail",
 409 | [407] = "bubble",
 410 | [408] = "chow, chow chow",
 411 | [409] = "street sign",
 412 | [410] = "water snake",
 413 | [411] = "wallet, billfold, notecase, pocketbook",
 414 | [412] = "suspension bridge",
 415 | [413] = "mountain tent",
 416 | [414] = "pelican",
 417 | [415] = "tennis ball",
 418 | [416] = "wardrobe, closet, press",
 419 | [417] = "syringe",
 420 | [418] = "green mamba",
 421 | [419] = "lifeboat",
 422 | [420] = "cornet, horn, trumpet, trump",
 423 | [421] = "common newt, Triturus vulgaris",
 424 | [422] = "lipstick, lip rouge",
 425 | [423] = "broom",
 426 | [424] = "dhole, Cuon alpinus",
 427 | [425] = "Labrador retriever",
 428 | [426] = "pencil box, pencil case",
 429 | [427] = "alligator lizard",
 430 | [428] = "police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria",
 431 | [429] = "French bulldog",
 432 | [430] = "albatross, mollymawk",
 433 | [431] = "screw",
 434 | [432] = "wig",
 435 | [433] = "barracouta, snoek",
 436 | [434] = "standard poodle",
 437 | [435] = "umbrella",
 438 | [436] = "garter snake, grass snake",
 439 | [437] = "Siberian husky",
 440 | [438] = "missile",
 441 | [439] = "shower curtain",
 442 | [440] = "tobacco shop, tobacconist shop, tobacconist",
 443 | [441] = "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon",
 444 | [442] = "English foxhound",
 445 | [443] = "jersey, T-shirt, tee shirt",
 446 | [444] = "head cabbage",
 447 | [445] = "necklace",
 448 | [446] = "bearskin, busby, shako",
 449 | [447] = "nipple",
 450 | [448] = "typewriter keyboard",
 451 | [449] = "rock crab, Cancer irroratus",
 452 | [450] = "eggnog",
 453 | [451] = "basketball",
 454 | [452] = "worm fence, snake fence, snake-rail fence, Virginia fence",
 455 | [453] = "switch, electric switch, electrical switch",
 456 | [454] = "dishwasher, dish washer, dishwashing machine",
 457 | [455] = "ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus",
 458 | [456] = "moped",
 459 | [457] = "plate",
 460 | [458] = "cellular telephone, cellular phone, cellphone, cell, mobile phone",
 461 | [459] = "tub, vat",
 462 | [460] = "West Highland white terrier",
 463 | [461] = "mousetrap",
 464 | [462] = "oil filter",
 465 | [463] = "Windsor tie",
 466 | [464] = "nematode, nematode worm, roundworm",
 467 | [465] = "odometer, hodometer, mileometer, milometer",
 468 | [466] = "sunglasses, dark glasses, shades",
 469 | [467] = "neck brace",
 470 | [468] = "tripod",
 471 | [469] = "spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish",
 472 | [470] = "potpie",
 473 | [471] = "hippopotamus, hippo, river horse, Hippopotamus amphibius",
 474 | [472] = "miniature poodle",
 475 | [473] = "park bench",
 476 | [474] = "cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM",
 477 | [475] = "pier",
 478 | [476] = "vulture",
 479 | [477] = "Gila monster, Heloderma suspectum",
 480 | [478] = "bannister, banister, balustrade, balusters, handrail",
 481 | [479] = "lemon",
 482 | [480] = "crate",
 483 | [481] = "megalith, megalithic structure",
 484 | [482] = "rugby ball",
 485 | [483] = "drumstick",
 486 | [484] = "starfish, sea star",
 487 | [485] = "studio couch, day bed",
 488 | [486] = "caldron, cauldron",
 489 | [487] = "Japanese spaniel",
 490 | [488] = "proboscis monkey, Nasalis larvatus",
 491 | [489] = "fountain pen",
 492 | [490] = "knot",
 493 | [491] = "espresso maker",
 494 | [492] = "oscilloscope, scope, cathode-ray oscilloscope, CRO",
 495 | [493] = "ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle",
 496 | [494] = "hotdog, hot dog, red hot",
 497 | [495] = "Pembroke, Pembroke Welsh corgi",
 498 | [496] = "butcher shop, meat market",
 499 | [497] = "Doberman, Doberman pinscher",
 500 | [498] = "vacuum, vacuum cleaner",
 501 | [499] = "rapeseed",
 502 | [500] = "computer keyboard, keypad",
 503 | [501] = "cannon",
 504 | [502] = "Band Aid",
 505 | [503] = "kite",
 506 | [504] = "tape player",
 507 | [505] = "breastplate, aegis, egis",
 508 | [506] = "chickadee",
 509 | [507] = "rain barrel",
 510 | [508] = "oystercatcher, oyster catcher",
 511 | [509] = "spotlight, spot",
 512 | [510] = "car mirror",
 513 | [511] = "mouse, computer mouse",
 514 | [512] = "medicine chest, medicine cabinet",
 515 | [513] = "soccer ball",
 516 | [514] = "bow tie, bow-tie, bowtie",
 517 | [515] = "red-backed sandpiper, dunlin, Erolia alpina",
 518 | [516] = "Arctic fox, white fox, Alopex lagopus",
 519 | [517] = "running shoe",
 520 | [518] = "oxygen mask",
 521 | [519] = "lumbermill, sawmill",
 522 | [520] = "pay-phone, pay-station",
 523 | [521] = "coffeepot",
 524 | [522] = "kit fox, Vulpes macrotis",
 525 | [523] = "potter's wheel",
 526 | [524] = "Petri dish",
 527 | [525] = "remote control, remote",
 528 | [526] = "barrel, cask",
 529 | [527] = "analog clock",
 530 | [528] = "garbage truck, dustcart",
 531 | [529] = "electric ray, crampfish, numbfish, torpedo",
 532 | [530] = "banjo",
 533 | [531] = "mortar",
 534 | [532] = "minibus",
 535 | [533] = "mantis, mantid",
 536 | [534] = "safety pin",
 537 | [535] = "weasel",
 538 | [536] = "eft",
 539 | [537] = "prayer rug, prayer mat",
 540 | [538] = "hot pot, hotpot",
 541 | [539] = "Leonberg",
 542 | [540] = "cabbage butterfly",
 543 | [541] = "toyshop",
 544 | [542] = "miniature schnauzer",
 545 | [543] = "fig",
 546 | [544] = "mailbag, postbag",
 547 | [545] = "common iguana, iguana, Iguana iguana",
 548 | [546] = "swab, swob, mop",
 549 | [547] = "affenpinscher, monkey pinscher, monkey dog",
 550 | [548] = "valley, vale",
 551 | [549] = "steel arch bridge",
 552 | [550] = "maze, labyrinth",
 553 | [551] = "miniskirt, mini",
 554 | [552] = "Sealyham terrier, Sealyham",
 555 | [553] = "macaque",
 556 | [554] = "motor scooter, scooter",
 557 | [555] = "sidewinder, horned rattlesnake, Crotalus cerastes",
 558 | [556] = "patio, terrace",
 559 | [557] = "pool table, billiard table, snooker table",
 560 | [558] = "black-footed ferret, ferret, Mustela nigripes",
 561 | [559] = "car wheel",
 562 | [560] = "Loafer",
 563 | [561] = "radiator",
 564 | [562] = "diaper, nappy, napkin",
 565 | [563] = "hay",
 566 | [564] = "great grey owl, great gray owl, Strix nebulosa",
 567 | [565] = "television, television system",
 568 | [566] = "goose",
 569 | [567] = "hair slide",
 570 | [568] = "mosque",
 571 | [569] = "squirrel monkey, Saimiri sciureus",
 572 | [570] = "schipperke",
 573 | [571] = "measuring cup",
 574 | [572] = "bath towel",
 575 | [573] = "kuvasz",
 576 | [574] = "bolete",
 577 | [575] = "grand piano, grand",
 578 | [576] = "teddy, teddy bear",
 579 | [577] = "ptarmigan",
 580 | [578] = "strainer",
 581 | [579] = "African crocodile, Nile crocodile, Crocodylus niloticus",
 582 | [580] = "conch",
 583 | [581] = "titi, titi monkey",
 584 | [582] = "milk can",
 585 | [583] = "jackfruit, jak, jack",
 586 | [584] = "horse cart, horse-cart",
 587 | [585] = "African chameleon, Chamaeleo chamaeleon",
 588 | [586] = "Model T",
 589 | [587] = "pop bottle, soda bottle",
 590 | [588] = "toy poodle",
 591 | [589] = "axolotl, mud puppy, Ambystoma mexicanum",
 592 | [590] = "papillon",
 593 | [591] = "gyromitra",
 594 | [592] = "fountain",
 595 | [593] = "Saluki, gazelle hound",
 596 | [594] = "trolleybus, trolley coach, trackless trolley",
 597 | [595] = "lionfish",
 598 | [596] = "walking stick, walkingstick, stick insect",
 599 | [597] = "black widow, Latrodectus mactans",
 600 | [598] = "guenon, guenon monkey",
 601 | [599] = "Norfolk terrier",
 602 | [600] = "Cardigan, Cardigan Welsh corgi",
 603 | [601] = "breakwater, groin, groyne, mole, bulwark, seawall, jetty",
 604 | [602] = "dugong, Dugong dugon",
 605 | [603] = "admiral",
 606 | [604] = "trombone",
 607 | [605] = "jack-o'-lantern",
 608 | [606] = "pajama, pyjama, pj's, jammies",
 609 | [607] = "brambling, Fringilla montifringilla",
 610 | [608] = "lycaenid, lycaenid butterfly",
 611 | [609] = "speedboat",
 612 | [610] = "bikini, two-piece",
 613 | [611] = "cauliflower",
 614 | [612] = "frilled lizard, Chlamydosaurus kingi",
 615 | [613] = "garden spider, Aranea diademata",
 616 | [614] = "whippet",
 617 | [615] = "chambered nautilus, pearly nautilus, nautilus",
 618 | [616] = "alp",
 619 | [617] = "hatchet",
 620 | [618] = "ruffed grouse, partridge, Bonasa umbellus",
 621 | [619] = "barbell",
 622 | [620] = "tarantula",
 623 | [621] = "dalmatian, coach dog, carriage dog",
 624 | [622] = "pick, plectrum, plectron",
 625 | [623] = "Weimaraner",
 626 | [624] = "chimpanzee, chimp, Pan troglodytes",
 627 | [625] = "traffic light, traffic signal, stoplight",
 628 | [626] = "American chameleon, anole, Anolis carolinensis",
 629 | [627] = "box turtle, box tortoise",
 630 | [628] = "spoonbill",
 631 | [629] = "power drill",
 632 | [630] = "gazelle",
 633 | [631] = "golden retriever",
 634 | [632] = "Blenheim spaniel",
 635 | [633] = "crayfish, crawfish, crawdad, crawdaddy",
 636 | [634] = "lynx, catamount",
 637 | [635] = "electric fan, blower",
 638 | [636] = "monitor",
 639 | [637] = "snowmobile",
 640 | [638] = "chiffonier, commode",
 641 | [639] = "parallel bars, bars",
 642 | [640] = "catamaran",
 643 | [641] = "window screen",
 644 | [642] = "desk",
 645 | [643] = "Great Dane",
 646 | [644] = "sarong",
 647 | [645] = "sea anemone, anemone",
 648 | [646] = "book jacket, dust cover, dust jacket, dust wrapper",
 649 | [647] = "pill bottle",
 650 | [648] = "Bouvier des Flandres, Bouviers des Flandres",
 651 | [649] = "stopwatch, stop watch",
 652 | [650] = "Kerry blue terrier",
 653 | [651] = "Scotch terrier, Scottish terrier, Scottie",
 654 | [652] = "can opener, tin opener",
 655 | [653] = "fire engine, fire truck",
 656 | [654] = "water ouzel, dipper",
 657 | [655] = "theater curtain, theatre curtain",
 658 | [656] = "partridge",
 659 | [657] = "orangutan, orang, orangutang, Pongo pygmaeus",
 660 | [658] = "soup bowl",
 661 | [659] = "howler monkey, howler",
 662 | [660] = "mask",
 663 | [661] = "Chesapeake Bay retriever",
 664 | [662] = "stage",
 665 | [663] = "bagel, beigel",
 666 | [664] = "beer glass",
 667 | [665] = "Angora, Angora rabbit",
 668 | [666] = "throne",
 669 | [667] = "jean, blue jean, denim",
 670 | [668] = "diamondback, diamondback rattlesnake, Crotalus adamanteus",
 671 | [669] = "seashore, coast, seacoast, sea-coast",
 672 | [670] = "soft-coated wheaten terrier",
 673 | [671] = "sandal",
 674 | [672] = "wire-haired fox terrier",
 675 | [673] = "coucal",
 676 | [674] = "rotisserie",
 677 | [675] = "zucchini, courgette",
 678 | [676] = "European fire salamander, Salamandra salamandra",
 679 | [677] = "screwdriver",
 680 | [678] = "Norwich terrier",
 681 | [679] = "sports car, sport car",
 682 | [680] = "airship, dirigible",
 683 | [681] = "passenger car, coach, carriage",
 684 | [682] = "sliding door",
 685 | [683] = "tabby, tabby cat",
 686 | [684] = "green lizard, Lacerta viridis",
 687 | [685] = "hermit crab",
 688 | [686] = "Shetland sheepdog, Shetland sheep dog, Shetland",
 689 | [687] = "ant, emmet, pismire",
 690 | [688] = "cicada, cicala",
 691 | [689] = "langur",
 692 | [690] = "stone wall",
 693 | [691] = "carousel, carrousel, merry-go-round, roundabout, whirligig",
 694 | [692] = "Gordon setter",
 695 | [693] = "container ship, containership, container vessel",
 696 | [694] = "maypole",
 697 | [695] = "reflex camera",
 698 | [696] = "tench, Tinca tinca",
 699 | [697] = "bald eagle, American eagle, Haliaeetus leucocephalus",
 700 | [698] = "guillotine",
 701 | [699] = "totem pole",
 702 | [700] = "jacamar",
 703 | [701] = "American black bear, black bear, Ursus americanus, Euarctos americanus",
 704 | [702] = "brown bear, bruin, Ursus arctos",
 705 | [703] = "night snake, Hypsiglena torquata",
 706 | [704] = "collie",
 707 | [705] = "desktop computer",
 708 | [706] = "minivan",
 709 | [707] = "water tower",
 710 | [708] = "prison, prison house",
 711 | [709] = "Dutch oven",
 712 | [710] = "shoe shop, shoe-shop, shoe store",
 713 | [711] = "balance beam, beam",
 714 | [712] = "pickup, pickup truck",
 715 | [713] = "macaw",
 716 | [714] = "stethoscope",
 717 | [715] = "electric locomotive",
 718 | [716] = "black grouse",
 719 | [717] = "cab, hack, taxi, taxicab",
 720 | [718] = "Polaroid camera, Polaroid Land camera",
 721 | [719] = "plunger, plumber's helper",
 722 | [720] = "candle, taper, wax light",
 723 | [721] = "slide rule, slipstick",
 724 | [722] = "Great Pyrenees",
 725 | [723] = "trilobite",
 726 | [724] = "ringlet, ringlet butterfly",
 727 | [725] = "komondor",
 728 | [726] = "polecat, fitch, foulmart, foumart, Mustela putorius",
 729 | [727] = "mobile home, manufactured home",
 730 | [728] = "file, file cabinet, filing cabinet",
 731 | [729] = "jigsaw puzzle",
 732 | [730] = "crane",
 733 | [731] = "earthstar",
 734 | [732] = "jellyfish",
 735 | [733] = "toucan",
 736 | [734] = "Chihuahua",
 737 | [735] = "damselfly",
 738 | [736] = "padlock",
 739 | [737] = "llama",
 740 | [738] = "monarch, monarch butterfly, milkweed butterfly, Danaus plexippus",
 741 | [739] = "scoreboard",
 742 | [740] = "ruddy turnstone, Arenaria interpres",
 743 | [741] = "jeep, landrover",
 744 | [742] = "iron, smoothing iron",
 745 | [743] = "Scottish deerhound, deerhound",
 746 | [744] = "American lobster, Northern lobster, Maine lobster, Homarus americanus",
 747 | [745] = "borzoi, Russian wolfhound",
 748 | [746] = "English setter",
 749 | [747] = "Irish terrier",
 750 | [748] = "racket, racquet",
 751 | [749] = "sloth bear, Melursus ursinus, Ursus ursinus",
 752 | [750] = "centipede",
 753 | [751] = "silky terrier, Sydney silky",
 754 | [752] = "cairn, cairn terrier",
 755 | [753] = "cockroach, roach",
 756 | [754] = "hognose snake, puff adder, sand viper",
 757 | [755] = "horned viper, cerastes, sand viper, horned asp, Cerastes cornutus",
 758 | [756] = "pizza, pizza pie",
 759 | [757] = "poncho",
 760 | [758] = "gong, tam-tam",
 761 | [759] = "wool, woolen, woollen",
 762 | [760] = "hare",
 763 | [761] = "Brabancon griffon",
 764 | [762] = "torch",
 765 | [763] = "castle",
 766 | [764] = "Egyptian cat",
 767 | [765] = "white stork, Ciconia ciconia",
 768 | [766] = "cougar, puma, catamount, mountain lion, painter, panther, Felis concolor",
 769 | [767] = "ocarina, sweet potato",
 770 | [768] = "purse",
 771 | [769] = "menu",
 772 | [770] = "apron",
 773 | [771] = "spotted salamander, Ambystoma maculatum",
 774 | [772] = "plastic bag",
 775 | [773] = "home theater, home theatre",
 776 | [774] = "armadillo",
 777 | [775] = "birdhouse",
 778 | [776] = "bulletproof vest",
 779 | [777] = "Greater Swiss Mountain dog",
 780 | [778] = "capuchin, ringtail, Cebus capucinus",
 781 | [779] = "tile roof",
 782 | [780] = "holster",
 783 | [781] = "spider web, spider's web",
 784 | [782] = "killer whale, killer, orca, grampus, sea wolf, Orcinus orca",
 785 | [783] = "hamster",
 786 | [784] = "marmoset",
 787 | [785] = "red fox, Vulpes vulpes",
 788 | [786] = "abacus",
 789 | [787] = "hammer",
 790 | [788] = "beaker",
 791 | [789] = "mashed potato",
 792 | [790] = "hog, pig, grunter, squealer, Sus scrofa",
 793 | [791] = "ox",
 794 | [792] = "espresso",
 795 | [793] = "three-toed sloth, ai, Bradypus tridactylus",
 796 | [794] = "projector",
 797 | [795] = "ski",
 798 | [796] = "knee pad",
 799 | [797] = "buckle",
 800 | [798] = "ladle",
 801 | [799] = "cradle",
 802 | [800] = "dumbbell",
 803 | [801] = "wall clock",
 804 | [802] = "turnstile",
 805 | [803] = "paintbrush",
 806 | [804] = "sturgeon",
 807 | [805] = "coral reef",
 808 | [806] = "steel drum",
 809 | [807] = "black-and-tan coonhound",
 810 | [808] = "pencil sharpener",
 811 | [809] = "Staffordshire bullterrier, Staffordshire bull terrier",
 812 | [810] = "crane",
 813 | [811] = "Maltese dog, Maltese terrier, Maltese",
 814 | [812] = "mud turtle",
 815 | [813] = "racer, race car, racing car",
 816 | [814] = "coffee mug",
 817 | [815] = "platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus",
 818 | [816] = "washer, automatic washer, washing machine",
 819 | [817] = "bassinet",
 820 | [818] = "mixing bowl",
 821 | [819] = "pretzel",
 822 | [820] = "paddle, boat paddle",
 823 | [821] = "bottlecap",
 824 | [822] = "dishrag, dishcloth",
 825 | [823] = "briard",
 826 | [824] = "Lakeland terrier",
 827 | [825] = "revolver, six-gun, six-shooter",
 828 | [826] = "trench coat",
 829 | [827] = "clumber, clumber spaniel",
 830 | [828] = "maraca",
 831 | [829] = "cup",
 832 | [830] = "nail",
 833 | [831] = "whistle",
 834 | [832] = "timber wolf, grey wolf, gray wolf, Canis lupus",
 835 | [833] = "terrapin",
 836 | [834] = "gibbon, Hylobates lar",
 837 | [835] = "dowitcher",
 838 | [836] = "flamingo",
 839 | [837] = "bassoon",
 840 | [838] = "Norwegian elkhound, elkhound",
 841 | [839] = "tow truck, tow car, wrecker",
 842 | [840] = "kelpie",
 843 | [841] = "bathtub, bathing tub, bath, tub",
 844 | [842] = "altar",
 845 | [843] = "brass, memorial tablet, plaque",
 846 | [844] = "king snake, kingsnake",
 847 | [845] = "stupa, tope",
 848 | [846] = "cuirass",
 849 | [847] = "golf ball",
 850 | [848] = "restaurant, eating house, eating place, eatery",
 851 | [849] = "bobsled, bobsleigh, bob",
 852 | [850] = "quill, quill pen",
 853 | [851] = "tick",
 854 | [852] = "mushroom",
 855 | [853] = "waffle iron",
 856 | [854] = "solar dish, solar collector, solar furnace",
 857 | [855] = "shield, buckler",
 858 | [856] = "acorn squash",
 859 | [857] = "cocker spaniel, English cocker spaniel, cocker",
 860 | [858] = "stinkhorn, carrion fungus",
 861 | [859] = "bluetick",
 862 | [860] = "crash helmet",
 863 | [861] = "ibex, Capra ibex",
 864 | [862] = "radio telescope, radio reflector",
 865 | [863] = "Boston bull, Boston terrier",
 866 | [864] = "patas, hussar monkey, Erythrocebus patas",
 867 | [865] = "Walker hound, Walker foxhound",
 868 | [866] = "king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica",
 869 | [867] = "stingray",
 870 | [868] = "table lamp",
 871 | [869] = "African grey, African gray, Psittacus erithacus",
 872 | [870] = "wine bottle",
 873 | [871] = "German short-haired pointer",
 874 | [872] = "house finch, linnet, Carpodacus mexicanus",
 875 | [873] = "cardoon",
 876 | [874] = "grille, radiator grille",
 877 | [875] = "space bar",
 878 | [876] = "anemone fish",
 879 | [877] = "cliff dwelling",
 880 | [878] = "Old English sheepdog, bobtail",
 881 | [879] = "manhole cover",
 882 | [880] = "microwave, microwave oven",
 883 | [881] = "sax, saxophone",
 884 | [882] = "ringneck snake, ring-necked snake, ring snake",
 885 | [883] = "ostrich, Struthio camelus",
 886 | [884] = "disk brake, disc brake",
 887 | [885] = "Australian terrier",
 888 | [886] = "Italian greyhound",
 889 | [887] = "goldfinch, Carduelis carduelis",
 890 | [888] = "planetarium",
 891 | [889] = "red wolf, maned wolf, Canis rufus, Canis niger",
 892 | [890] = "Dandie Dinmont, Dandie Dinmont terrier",
 893 | [891] = "Arabian camel, dromedary, Camelus dromedarius",
 894 | [892] = "abaya",
 895 | [893] = "rocking chair, rocker",
 896 | [894] = "burrito",
 897 | [895] = "ballpoint, ballpoint pen, ballpen, Biro",
 898 | [896] = "hair spray",
 899 | [897] = "Rottweiler",
 900 | [898] = "hyena, hyaena",
 901 | [899] = "matchstick",
 902 | [900] = "hartebeest",
 903 | [901] = "sewing machine",
 904 | [902] = "leopard, Panthera pardus",
 905 | [903] = "corkscrew, bottle screw",
 906 | [904] = "Shih-Tzu",
 907 | [905] = "chain",
 908 | [906] = "organ, pipe organ",
 909 | [907] = "sunglass",
 910 | [908] = "Mexican hairless",
 911 | [909] = "lawn mower, mower",
 912 | [910] = "bullet train, bullet",
 913 | [911] = "space shuttle",
 914 | [912] = "mountain bike, all-terrain bike, off-roader",
 915 | [913] = "Dungeness crab, Cancer magister",
 916 | [914] = "entertainment center",
 917 | [915] = "basset, basset hound",
 918 | [916] = "mortarboard",
 919 | [917] = "punching bag, punch bag, punching ball, punchball",
 920 | [918] = "accordion, piano accordion, squeeze box",
 921 | [919] = "wooden spoon",
 922 | [920] = "coral fungus",
 923 | [921] = "gondola",
 924 | [922] = "magpie",
 925 | [923] = "mitten",
 926 | [924] = "dial telephone, dial phone",
 927 | [925] = "geyser",
 928 | [926] = "barn spider, Araneus cavaticus",
 929 | [927] = "custard apple",
 930 | [928] = "pitcher, ewer",
 931 | [929] = "maillot",
 932 | [930] = "bicycle-built-for-two, tandem bicycle, tandem",
 933 | [931] = "bonnet, poke bonnet",
 934 | [932] = "king penguin, Aptenodytes patagonica",
 935 | [933] = "quilt, comforter, comfort, puff",
 936 | [934] = "cocktail shaker",
 937 | [935] = "saltshaker, salt shaker",
 938 | [936] = "hard disc, hard disk, fixed disk",
 939 | [937] = "harvestman, daddy longlegs, Phalangium opilio",
 940 | [938] = "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system",
 941 | [939] = "ballplayer, baseball player",
 942 | [940] = "echidna, spiny anteater, anteater",
 943 | [941] = "violin, fiddle",
 944 | [942] = "hammerhead, hammerhead shark",
 945 | [943] = "skunk, polecat, wood pussy",
 946 | [944] = "brassiere, bra, bandeau",
 947 | [945] = "impala, Aepyceros melampus",
 948 | [946] = "junco, snowbird",
 949 | [947] = "French horn, horn",
 950 | [948] = "fox squirrel, eastern fox squirrel, Sciurus niger",
 951 | [949] = "sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita",
 952 | [950] = "cloak",
 953 | [951] = "pickelhaube",
 954 | [952] = "Granny Smith",
 955 | [953] = "scuba diver",
 956 | [954] = "boa constrictor, Constrictor constrictor",
 957 | [955] = "mongoose",
 958 | [956] = "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin",
 959 | [957] = "leafhopper",
 960 | [958] = "grey fox, gray fox, Urocyon cinereoargenteus",
 961 | [959] = "sea lion",
 962 | [960] = "crossword puzzle, crossword",
 963 | [961] = "sea snake",
 964 | [962] = "golfcart, golf cart",
 965 | [963] = "cardigan",
 966 | [964] = "brain coral",
 967 | [965] = "hook, claw",
 968 | [966] = "apiary, bee house",
 969 | [967] = "mailbox, letter box",
 970 | [968] = "doormat, welcome mat",
 971 | [969] = "clog, geta, patten, sabot",
 972 | [970] = "American alligator, Alligator mississipiensis",
 973 | [971] = "limpkin, Aramus pictus",
 974 | [972] = "warthog",
 975 | [973] = "Border terrier",
 976 | [974] = "seat belt, seatbelt",
 977 | [975] = "leaf beetle, chrysomelid",
 978 | [976] = "lion, king of beasts, Panthera leo",
 979 | [977] = "lighter, light, igniter, ignitor",
 980 | [978] = "quail",
 981 | [979] = "hand-held computer, hand-held microcomputer",
 982 | [980] = "perfume, essence",
 983 | [981] = "cello, violoncello",
 984 | [982] = "Rhodesian ridgeback",
 985 | [983] = "flagpole, flagstaff",
 986 | [984] = "kimono",
 987 | [985] = "sulphur butterfly, sulfur butterfly",
 988 | [986] = "cowboy hat, ten-gallon hat",
 989 | [987] = "barometer",
 990 | [988] = "Christmas stocking",
 991 | [989] = "snow leopard, ounce, Panthera uncia",
 992 | [990] = "confectionery, confectionary, candy store",
 993 | [991] = "weevil",
 994 | [992] = "warplane, military plane",
 995 | [993] = "eel",
 996 | [994] = "convertible",
 997 | [995] = "white wolf, Arctic wolf, Canis lupus tundrarum",
 998 | [996] = "Irish setter, red setter",
 999 | [997] = "fiddler crab",
1000 | [998] = "plane, carpenter's plane, woodworking plane",
1001 | [999] = "shoji",
1002 | };
1003 | 


--------------------------------------------------------------------------------
/test_xnornet.c:
--------------------------------------------------------------------------------
 1 | #include "util.h"
 2 | #include "xnornet.h"
 3 | #include "xnornet_bwn.h"
 4 | #include <assert.h>
 5 | #include <stdio.h>
 6 | #include <sched.h>
 7 | 
 8 | /*
 9 | clang-5.0 -target aarch64-linux-gnu -I /usr/aarch64-linux-gnu/include -Wno-builtin-requires-header test_xnornet.c util.c xnornet.c xnornet_bwn.c -lm -I. -flto -Ofast -pthread -mcpu=cortex-a53
10 | */
11 | 
12 | static uint8_t tmp[xnornet_tmp_size] __attribute__((aligned(16)));
13 | static uint8_t tmp2[xnornet_bwn_tmp_size] __attribute__((aligned(16)));
14 | 
15 | #include <time.h>
16 | static uint64_t get_time(void)
17 | {
18 |     struct timespec ts;
19 | 
20 |     clock_gettime(CLOCK_MONOTONIC, &ts);
21 |     return (uint64_t) ts.tv_sec * 1000000000ull + ts.tv_nsec;
22 | }
23 | #define TIME(x...) ({ uint64_t t0 = get_time(); (x); uint64_t t1 = get_time(); \
24 |     printf("time: %f\n", (float) (t1 - t0) / 1000000.0f); })
25 | 
26 | int main(void)
27 | {
28 |     int err;
29 |     string weights, weights2, image;
30 |     float xf[227*227*3], *y;
31 |     int top[5];
32 | 
33 |     err = sched_setscheduler(getpid(), SCHED_FIFO, &(struct sched_param) {.sched_priority = 1});
34 |     if (err)
35 |         printf("failed to set priority\n");
36 | 
37 |     err = file_mmap(&weights, "xnornet_weights");
38 |     assert(!err);
39 | 
40 |     err = file_mmap(&weights2, "xnornet_bwn_weights");
41 |     assert(!err);
42 | 
43 |     err = file_mmap(&image, "image");
44 |     assert(!err);
45 | 
46 |     assert(weights.size == xnornet_size);
47 |     assert(weights2.size == xnornet_bwn_size);
48 | 
49 |     {
50 |         float m[] = {0.01735949, 0.01772787, 0.01774145};
51 |         float b[] = {-2.13645733, -2.04468092, -1.81410977};
52 |         float *ptr = image.ptr;
53 |         for (int i = 0; i < 227*227*3; i++)
54 |             xf[i] = ptr[i] * m[i % 3] + b[i % 3];
55 |     }
56 |     TIME(y = xnornet(xf, weights.ptr, tmp));
57 |     softmax(y, 1000);
58 |     top5(top, y, 1000);
59 | 
60 |     printf("XNORNET:\n%u:%f\n%u:%f\n%u:%f\n%u:%f\n%u:%f\n",
61 |         top[0], y[top[0]],
62 |         top[1], y[top[1]],
63 |         top[2], y[top[2]],
64 |         top[3], y[top[3]],
65 |         top[4], y[top[4]]);
66 | 
67 |     TIME(y = xnornet_bwn(xf, weights2.ptr, tmp2));
68 |     softmax(y, 1000);
69 |     top5(top, y, 1000);
70 | 
71 |     printf("BWN:\n%u:%f\n%u:%f\n%u:%f\n%u:%f\n%u:%f\n",
72 |         top[0], y[top[0]],
73 |         top[1], y[top[1]],
74 |         top[2], y[top[2]],
75 |         top[3], y[top[3]],
76 |         top[4], y[top[4]]);
77 | 
78 |     return 0;
79 | }
80 | 


--------------------------------------------------------------------------------
/tf_export.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.python.framework import tensor_util
  3 | import numpy as np
  4 | import copy
  5 | 
  6 | # attempt to evaluate a tensorflow variable
  7 | # (needed because batch norm has non-fetchable variables)
  8 | def eval(var):
  9 |     op = var.op
 10 |     if op.type == 'Identity':
 11 |         val = op.inputs[0].eval(feed_dict={'is_training:0':False})
 12 |     elif op.type == 'Switch':
 13 |         val = op.inputs[0].eval(feed_dict={'is_training:0':False})
 14 |     elif op.type == 'Const':
 15 |         val = tensor_util.MakeNdarray(op.get_attr('value'))
 16 |     else:
 17 |         val = var.eval(feed_dict={'is_training:0':False})
 18 |     return val
 19 | 
 20 | # layer tracking when parsing graph
 21 | class Layer(object):
 22 |     def __init__(self):
 23 |         self.m = np.ones(1)
 24 |         self.b = np.zeros(1)
 25 |         self.act = 'none'
 26 |         self.pool = None
 27 |         self.binary = False
 28 |         self.relu_fix = None
 29 | 
 30 |     def mul(self, m):
 31 |         self.m = self.m * m
 32 | 
 33 |     def add(self, b):
 34 |         self.b = self.b + self.m * b
 35 | 
 36 |     def activate(self, act):
 37 |         if act == 'relu' and self.act == 'bin': # xnornet workaround
 38 |             assert(self.relu_fix is None)
 39 |             self.relu_fix = -self.b / self.m < 0.0
 40 |             return
 41 |         assert(np.array_equal(self.m, np.ones(1)))
 42 |         assert(np.array_equal(self.b, np.zeros(1)))
 43 |         self.act = act;
 44 | 
 45 |     def max_pool(self, pool):
 46 |         assert(self.pool == None)
 47 |         self.pool = pool
 48 |         self.pool_xor = self.m < 0.0 # for xnornet workaround
 49 | 
 50 |     def finish(self, w, strides, padding, binary, num_output):
 51 |         print('finish')
 52 |         self.w = w
 53 |         self.strides = strides
 54 |         self.padding = padding
 55 |         self.binary = binary
 56 |         self.num_output = num_output
 57 | #
 58 | def clipto(x, type):
 59 |     info = np.iinfo(np.int32)
 60 |     return np.clip(x, info.min, info.max)
 61 | 
 62 | #
 63 | def reorder(x, group_size, pairing):
 64 |     assert(x.shape[2] % pairing == 0)
 65 | 
 66 |     x0 = None
 67 |     if x.shape[3] % group_size:
 68 |         off = x.shape[3] // group_size * group_size;
 69 |         x0 = x[:,:,:,off:]
 70 |         x = x[:,:,:,:off]
 71 | 
 72 |         oldshape = x0.shape
 73 |         x0 = np.reshape(x0, (*x0.shape[:2], x0.shape[2] // pairing, pairing, 1, x0.shape[3]))
 74 |         x0 = np.transpose(x0, (0, 1, 4, 2, 5, 3))
 75 |         x0 = np.reshape(x0, oldshape)
 76 | 
 77 |     oldshape = x.shape
 78 |     x = np.reshape(x, (*x.shape[:2], x.shape[2] // pairing, pairing, \
 79 |                                      x.shape[3] // group_size, group_size))
 80 |     x = np.transpose(x, (0, 1, 4, 2, 5, 3))
 81 |     x = np.reshape(x, oldshape)
 82 | 
 83 |     x = np.reshape(x, (*x.shape[:2], -1))
 84 | 
 85 |     if x0 is not None:
 86 |         x0 = np.reshape(x0, (*x0.shape[:2], -1))
 87 |         x = np.concatenate([x, x0], axis=2)
 88 |     return x
 89 | 
 90 | """
 91 | Notes
 92 | - some optimizations missing
 93 | - only supports convolutional networks
 94 | """
 95 | 
 96 | def export(output, input, prefix, quantize):
 97 | 
 98 |     weight_data = bytearray()
 99 |     layer_param = []
100 | 
101 |     processed = []
102 |     layers = []
103 |     input_scale = None
104 |     input_offset = None
105 |     input_type = 'FLOAT'
106 | 
107 |     var = [(output, Layer())]
108 | 
109 |     binary = False
110 | 
111 |     while var:
112 |         _var = var
113 |         var = []
114 |         for (v, layer) in _var:
115 |             if v in processed:
116 |                 print('TODO: VARIABLE IS REUSED')
117 |                 continue
118 |             processed += [v]
119 | 
120 |             op = v.op
121 | 
122 |             if op.type == 'Identity':
123 |                 assert(len(op.inputs) == 1 and len(op.outputs) == 1)
124 |             elif op.type == 'Merge':
125 |                 assert(v == op.outputs[0])
126 |                 var += [(i, copy.copy(layer)) for i in op.inputs]
127 |                 continue
128 |             elif op.type == 'Switch':
129 |                 assert(len(op.inputs) == 2 and len(op.outputs) == 2)
130 |                 pred = eval(op.inputs[1])
131 |                 if v == op.outputs[pred*1]:
132 |                     var += [(op.inputs[0], layer)]
133 |                 continue
134 |             elif op.type == 'FusedBatchNorm':
135 |                 assert(len(op.inputs) == 5 and len(op.outputs) == 5)
136 | 
137 |                 epsilon = op.get_attr('epsilon')
138 |                 scale = eval(op.inputs[1])
139 |                 offset = eval(op.inputs[2])
140 |                 mean = eval(op.inputs[3])
141 |                 variance = eval(op.inputs[4])
142 | 
143 |                 if mean.size == 0:
144 |                     mean = np.zeros(1)
145 |                 if variance.size == 0:
146 |                     variance = np.ones(1)
147 | 
148 |                 m = scale / np.sqrt(variance + epsilon)
149 |                 b = -mean * m + offset
150 | 
151 |                 layer.add(b)
152 |                 layer.mul(m)
153 |             elif op.type == 'Add' or op.type == 'Sub' or op.type == 'Mul':
154 |                 assert(len(op.inputs) == 2 and len(op.outputs) == 1)
155 | 
156 |                 # detect binary activation (hacky)
157 |                 if op.type == 'Add' and op.inputs[1].op.type == 'StopGradient':
158 |                     x = op.inputs[0].op
159 |                     y = x.inputs[0].op
160 |                     assert(x.type == 'Maximum')
161 |                     assert(y.type == 'Minimum')
162 |                     assert(layer.m == np.ones(1) and layer.b == np.zeros(1))
163 | 
164 |                     layer.activate('bin')
165 |                     op = y
166 |                 elif op.type == 'Add':
167 |                     layer.add(op.inputs[1].eval())
168 |                 elif op.type == 'Sub':
169 |                     layer.add(-op.inputs[1].eval())
170 |                 else:
171 |                     layer.mul(op.inputs[1].eval())
172 |             elif op.type == 'Placeholder':
173 |                 assert(len(op.inputs) == 0 and len(op.outputs) == 1)
174 |                 assert(v == input)
175 |                 print('input parameters', layer.m, layer.b)
176 |                 input_scale = layer.m
177 |                 input_offset = layer.b
178 |                 if layer.act == 'bin':
179 |                     binary = True
180 |                     input_type = 'BINARY'
181 |                 else:
182 |                     assert(layer.act == 'none')
183 |                 continue
184 |             elif op.type == 'MaxPool':
185 |                 assert(len(op.inputs) == 1 and len(op.outputs) == 1)
186 |                 assert(op.get_attr('padding') == b'VALID')
187 |                 layer.max_pool((op.get_attr('ksize'), op.get_attr('strides')))
188 |             elif op.type == 'Conv2D':
189 |                 assert(len(op.inputs) == 2 and len(op.outputs) == 1)
190 |                 op2 = op.inputs[0].op
191 |                 inp = op.inputs[0]
192 | 
193 |                 padding = [0, 0]
194 |                 W = op.inputs[1].eval()
195 | 
196 |                 if op2.type == 'Pad':
197 |                     assert(len(op2.inputs) == 2 and len(op2.outputs) == 1)
198 |                     #assert(op2.get_attr('mode') == 'CONSTANT')
199 |                     inp = op2.inputs[0]
200 |                     pad = op2.inputs[1].eval()
201 |                     assert(np.all(pad[0] == 0) and np.all(pad[3] == 0))
202 |                     assert(pad[1][0] == pad[1][1] and pad[2][0] == pad[2][1])
203 |                     padding = [pad[1][0], pad[2][0]]
204 | 
205 |                 if op.get_attr('padding') == b'SAME':
206 |                     padding[0] += W.shape[0] // 2
207 |                     padding[1] += W.shape[1] // 2
208 | 
209 |                 # detect binary weights (TODO)
210 |                 bw = False
211 |                 if op.inputs[1].op.type == 'Add':
212 |                     bw = True
213 | 
214 |                 print(padding, bw, W.shape)
215 | 
216 |                 stride = op.get_attr('strides')
217 |                 assert(stride[0] == 1 and stride[3] == 1)
218 |                 strides = [stride[1], stride[2]]
219 | 
220 |                 layer.finish(W, strides, padding, bw, np.prod([x.value for x in op.outputs[0].shape[1:]]))
221 |                 layers += [layer]
222 |                 var += [(inp, Layer())]
223 |                 continue
224 |             elif op.type == 'Relu':
225 |                 assert(len(op.inputs) == 1 and len(op.outputs) == 1)
226 |                 layer.activate('relu')
227 |             else:
228 |                 print('Unknown operation:', op.type)
229 |                 assert(False)
230 |             var += [(op.inputs[0], layer)]
231 | 
232 |     weight_data = bytearray()
233 |     layers = layers[::-1]
234 |     code = []
235 |     codew = []
236 | 
237 |     tmp_size = 0 # TODO initialize to size of input
238 | 
239 |     for i, layer in enumerate(layers):
240 |         shape = layer.w.shape
241 | 
242 |         # update required size of temp memory
243 |         k = layer.num_output
244 |         if layer.act == 'bin':
245 |             k = (layer.num_output + 7) // 8
246 |         else:
247 |             k = layer.num_output * 4
248 |         tmp_size = max(tmp_size, k)
249 | 
250 |         #
251 |         in_size = np.prod(layer.w.shape[:-1])
252 | 
253 |         if layer.binary:
254 |             layer.m = layer.m * np.mean(np.abs(layer.w), axis=(0, 1, 2))
255 |             layer.wb = layer.w < 0.0
256 |             layer.w = None
257 |         else:
258 |             layer.w *= layer.m
259 |             layer.m = None
260 | 
261 |         xnornet_fix = None
262 |         if layer.act == 'bin' and layer.binary:
263 |             # xnornet negative scaling workarounds
264 |             if layer.pool is not None:
265 |                 xnornet_fix = layer.pool_xor
266 |                 assert(np.all((layer.m < 0.0) == layer.pool_xor))
267 |             else:
268 |                 xnornet_fix = layer.m < 0.0
269 | 
270 |             if not np.any(xnornet_fix):
271 |                 xnornet_fix = None
272 |         elif layer.act == 'bin' and layer.pool is not None and np.any(layer.pool_xor):
273 |             print('fix', layer.pool_xor[0])
274 |             layer.w *= np.where(layer.pool_xor, -1.0, 1.0)
275 |             layer.b *= np.where(layer.pool_xor, -1.0, 1.0)
276 |             xnornet_fix = layer.pool_xor
277 | 
278 |         # extra parameters
279 |         if binary:
280 |             assert(layer.binary)
281 |             if layer.act == 'bin':
282 |                 k = -layer.b / layer.m
283 |                 k = (in_size - k) / 2.0
284 |                 k = np.floor(np.clip(k, 0, np.iinfo(np.uint16).max)).astype(np.uint16)
285 |                 name = 'bin'
286 |             else:
287 |                 k = np.concatenate([layer.m, layer.b]).astype(np.float32)
288 |                 name = 'bin_float'
289 |         else:
290 |             if layer.binary:
291 |                 assert(layer.act != 'bin') # TODO
292 |                 k = np.concatenate([layer.m, layer.b]).astype(np.float32)
293 |                 name =  'bin_float'
294 |             else:
295 |                 if layer.act == 'bin' and layer.relu_fix is not None:
296 |                     layer.b = np.where(layer.relu_fix, np.inf, layer.b)
297 |                     layer.relu_fix = None
298 | 
299 |                 if quantize:
300 |                     _min = np.min(layer.w, axis=(0,1,2))
301 |                     _max = np.max(layer.w, axis=(0,1,2))
302 |                     m = 255.0 / (_max - _min)
303 | 
304 |                     # used floored min so integer arithmetic can be used
305 |                     layer_min = np.floor(_min * m)
306 |                     _min = layer_min * _max / (255.0 + layer_min)
307 |                     m = 255.0 / (_max - _min)
308 | 
309 |                     layer.w = np.round(layer.w * m - layer_min) - 128.0
310 |                     layer.w = np.clip(layer.w, -128.0, 127.0) # shouldnt be necessary
311 | 
312 |                     k = np.concatenate([layer.b, 1.0 / m, layer_min + 128.0, np.sum(layer.w, axis=(0,1,2))])
313 |                 else:
314 |                     k = layer.b
315 | 
316 |                 k = k.astype(np.float32)
317 |                 name = 'int8' if quantize else 'float'
318 | 
319 |         assert(layer.relu_fix is None)
320 | 
321 |         # reordering
322 |         # we can gain some performance by ordering the weights in a way
323 |         # specific to the implementation
324 |         # currently set for armv7a implementation
325 |         group_size = 32 # 96
326 |         if layer.binary:
327 |             if binary:
328 |                 group_size = 64 # 128
329 |                 assert(layer.wb.shape[3] % group_size == 0)
330 |                 wd = np.packbits(reorder(layer.wb, group_size, 8))
331 |             else:
332 |                 # float input with binary weights
333 |                 assert(layer.wb.shape[3] % group_size == 0)
334 |                 if quantize:
335 |                     assert(layer.wb.shape[2] % 2 == 0)
336 |                     assert(group_size % 8 == 0)
337 | 
338 |                     x = layer.wb
339 |                     x = np.reshape(x, (*x.shape[:2], x.shape[2] // 2, 2, \
340 |                                    x.shape[3] // group_size, group_size // 8, 8))
341 |                     x = np.transpose(x, (0, 1, 4, 2, 5, 3, 6))
342 |                     wd = np.packbits(x)
343 |                 else:
344 |                     wd = np.packbits(reorder(layer.wb, group_size, 1))
345 |         else:
346 |             if quantize:
347 |                 wd = reorder(layer.w, group_size, 1).astype(np.int8)
348 |             else:
349 |                 wd = reorder(layer.w, group_size, 1).astype(np.float32)
350 | 
351 | 
352 |         #code for this layer
353 |         if quantize and not binary:
354 |             code += ['x = quantize(x, arg->quant_param, %i, %i);' % (1 if layer.binary else 0, 1 if i == 0 else 0)]
355 | 
356 |         code += ['x = conv2d(x, (tensor) {4, {%i, %i, %i, %i}, w->layer%i.w, .type=%s}, (tensor) {2, {1, %i}, w->layer%i.b}, %i, %i, %i, %i, %s, %s, arg->sync);' % (*shape, i, 'BINARY' if layer.binary else 'INT8' if quantize else 'FLOAT', shape[3], i, layer.strides[0], layer.strides[1], layer.padding[0], layer.padding[1], 'ACTIVE_' + layer.act.upper(), 'arg->quant_param' if quantize and not binary else '0')]
357 | 
358 |         if layer.pool is not None:
359 |             code += ['x = maxpool(x, %i, %i, %i, %i, %s, arg->sync);' % (layer.pool[0][1], layer.pool[0][2], layer.pool[1][1], layer.pool[1][2], ('w->xor%i' % i) if xnornet_fix is not None else '0')]
360 |         elif xnornet_fix is not None:
361 |             code += ['x = xnornet_fix(x, w->xor%i);' % i]
362 | 
363 |         codew += ['w_%s(%i, %i) layer%i;' % (name, np.prod(shape[:-1]), shape[-1], i)]
364 | 
365 | 
366 |         weight_data += wd.tobytes() + k.tobytes()
367 | 
368 |         if xnornet_fix is not None:
369 |             # pad to 16byte multiple
370 |             size = ((shape[-1] - 1) // 128) + 1
371 |             xnornet_fix = np.resize(xnornet_fix, size * 128)
372 |             codew += ['uint8_t xor%i[%i/8];' % (i, size*128)]
373 |             weight_data += np.packbits(xnornet_fix).tobytes()
374 | 
375 |         print('layer', i, name)
376 |         binary = layer.act == 'bin'
377 | 
378 |     print(len(weight_data))
379 | 
380 |     assert(tmp_size % 16 == 0)
381 | 
382 |     code = [x+' wait(arg, %i);'%(i+1) for (i, x) in enumerate(code)]
383 | 
384 |     c = '#define FUNCTION_NAME %s\n#include <c_ops.h>\n' % prefix
385 |     c += 'struct weights {\n' + '\n'.join(codew) + '};\n'
386 |     c += '_Static_assert(sizeof(struct weights) == %i, "");\n' % len(weight_data)
387 | 
388 |     c += 'static void* worker(void *_arg) {\n'
389 |     c += 'struct thread_arg *arg = _arg;;\n'
390 |     c += 'struct weights *w = arg->weights;\n'
391 |     c += '#ifdef PRINT_TIME\nuint64_t t0 = get_time(), t1;\n#endif\n'
392 |     c += 'tensor x = (tensor) {3, {%s}, __builtin_assume_aligned(arg->in, 16), {__builtin_assume_aligned(arg->tmp, 16), __builtin_assume_aligned(arg->tmp, 16) + %i}, .type=%s};\n' % (', '.join([str(x) for x in input.shape[1:]]), tmp_size, input_type)
393 |     c += ' TIME();\n'.join(code)
394 |     c += ' TIME();\nreturn x.data;\n}\n'
395 |     #output.shape[1].value
396 | 
397 |     h = '#include <stdint.h>\n'
398 |     h += '#define %s_size %i\n' % (prefix, len(weight_data))
399 |     h += '#define %s_tmp_size (2*%i+16)\n' % (prefix, tmp_size)
400 |     h += 'void* %s(void *in, void *weights, void *tmp);\n' % prefix
401 | 
402 |     f = open(prefix + '_weights', 'wb')
403 |     f.write(weight_data)
404 |     f.close()
405 | 
406 |     f = open(prefix + '.c', 'w')
407 |     f.write(c)
408 |     f.close()
409 | 
410 |     f = open(prefix + '.h', 'w')
411 |     f.write(h)
412 |     f.close()
413 | 


--------------------------------------------------------------------------------
/util.c:
--------------------------------------------------------------------------------
 1 | #include <unistd.h>
 2 | #include <fcntl.h>
 3 | #include <sys/mman.h>
 4 | #include <sys/stat.h>
 5 | 
 6 | #include "util.h"
 7 | 
 8 | int file_mmap(string *res, char *path)
 9 | {
10 |     int fd;
11 |     struct stat stat;
12 |     void *map;
13 | 
14 |     fd = open(path, O_RDONLY);
15 |     if (fd < 0)
16 |         return -1;
17 | 
18 |     map = fstat(fd, &stat) ?
19 |         MAP_FAILED :
20 |         mmap(0, stat.st_size, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fd, 0);
21 |     close(fd);
22 |     if (map == MAP_FAILED)
23 |         return -1;
24 | 
25 |     *res = (string) {map, stat.st_size};
26 |     return 0;
27 | }
28 | 
29 | void top5(int *top, float *y, int num)
30 | {
31 |     int i, j;
32 |     for (i = 0; i < num; i++) {
33 |         for (j = 0; j < 5 && j < i && y[top[j]] >= y[i]; j++);
34 | 
35 |         if (j == 0) {
36 |             top[4] = top[3];
37 |             top[3] = top[2];
38 |             top[2] = top[1];
39 |             top[1] = top[0];
40 |             top[0] = i;
41 |         }
42 | 
43 |         if (j == 1) {
44 |             top[4] = top[3];
45 |             top[3] = top[2];
46 |             top[2] = top[1];
47 |             top[1] = i;
48 |         }
49 | 
50 |         if (j == 2) {
51 |             top[4] = top[3];
52 |             top[3] = top[2];
53 |             top[2] = i;
54 |         }
55 | 
56 |         if (j == 3) {
57 |             top[4] = top[3];
58 |             top[3] = i;
59 |         }
60 | 
61 |         if (j == 4) {
62 |             top[4] = i;
63 |         }
64 |     }
65 | }
66 | 
67 | #include <math.h>
68 | 
69 | void softmax(float *out, int num)
70 | {
71 |     uint i, j;
72 |     double d, q;
73 | 
74 |     for (i = 0, d = 0.0, q = -INFINITY; i < num; i++) {
75 |         d += exp((double) out[i] / 1.0);
76 |         if (q < out[i]) {
77 |             q = out[i];
78 |             j = i;
79 |         }
80 |     }
81 | 
82 |     for (i = 0; i < num; i++)
83 |         out[i] = (exp((double) out[i] / 1.0) / d);
84 | }
85 | 


--------------------------------------------------------------------------------
/util.h:
--------------------------------------------------------------------------------
 1 | #include <stdlib.h>
 2 | 
 3 | typedef struct {
 4 |     void *ptr;
 5 |     size_t size;
 6 | } string;
 7 | 
 8 | int file_mmap(string *res, char *path);
 9 | 
10 | void top5(int *top, float *y, int num);
11 | void softmax(float *out, int num);
12 | 


--------------------------------------------------------------------------------
/xnornet.py:
--------------------------------------------------------------------------------
  1 | """
  2 | XNORNet / BWN implementation, importing pretrained weights with pytorch
  3 | """
  4 | 
  5 | import tensorflow as tf
  6 | import numpy as np
  7 | 
  8 | import torch
  9 | import torch.legacy.nn as nn
 10 | nn.BinActiveZ = nn.ReLU # so the deserialization doesnt fail
 11 | from torch.utils.serialization import load_lua
 12 | 
 13 | import bnn
 14 | import tf_export
 15 | 
 16 | BWN = True
 17 | quantize = True
 18 | 
 19 | cache = load_lua('data/cache/meanstdCache.t7')
 20 | model = load_lua('data/alexnet_BWN.t7' if BWN else 'data/alexnet_XNOR.t7')
 21 | 
 22 | x0 = tf.placeholder(tf.float32, [None, 227, 227, 3])
 23 | train = tf.placeholder(tf.bool, name='is_training')
 24 | 
 25 | x = (x0 * (1.0 / 255.0) - np.array(cache.mean)) * (1.0 / np.array(cache.std))
 26 | 
 27 | x = tf.pad(x, [[0, 0], [2, 2], [2, 2], [0, 0]], 'CONSTANT')
 28 | 
 29 | 
 30 | x = bnn.layer(x, 96, filter_size=[11, 11], stride=[4, 4], pool=([3, 3], [2, 2]),
 31 |  epsilon=0.001 if BWN else 0.00001, binary=False, padding='VALID', activate='relu')
 32 | 
 33 | if not BWN:
 34 |     x = bnn.batch_norm(x, 0.0001)
 35 |     x = bnn.activation(x)
 36 | 
 37 | if BWN:
 38 |     act = 'relu'
 39 |     eps = 0.001
 40 | else:
 41 |     act = 'bin'
 42 |     eps = 0.0001
 43 | 
 44 | x = bnn.layer(x, 256, filter_size=[5, 5], pool=([3, 3], [2, 2]), activate=act, epsilon=eps)
 45 | 
 46 | x = bnn.layer(x, 384, filter_size=[3, 3], activate=act, epsilon=eps)
 47 | x = bnn.layer(x, 384, filter_size=[3, 3], activate=act, epsilon=eps)
 48 | x = bnn.layer(x, 256, filter_size=[3, 3], pool=([3, 3], [2, 2]), activate=act, epsilon=eps)
 49 | 
 50 | x = bnn.layer(x, 4096, filter_size=[6, 6], padding='VALID', activate=act, epsilon=eps)
 51 | x = bnn.layer(x, 4096, activate='relu', epsilon=0.001)
 52 | x = bnn.layer(x, 1000, activate='none', norm=False, binary=False)
 53 | 
 54 | y = tf.identity(x)
 55 | 
 56 | softmax = tf.nn.softmax(y)
 57 | 
 58 | # helper function to load torch batch norm weights into tensorflow batch norm variables
 59 | def load_batch_norm(scope, module):
 60 |     beta = gamma = mean = variance = None
 61 |     for i in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=scope+'/'):
 62 |         if i.name == scope+'/beta:0':
 63 |             assert(beta == None)
 64 |             beta = i
 65 |         elif i.name == scope+'/gamma:0':
 66 |             assert(gamma == None)
 67 |             gamma = i
 68 |         elif i.name == scope+'/moving_mean:0':
 69 |             assert(mean == None)
 70 |             mean = i
 71 |         elif i.name == scope+'/moving_variance:0':
 72 |             assert(variance == None)
 73 |             variance = i
 74 |         else:
 75 |             assert(False)
 76 | 
 77 |     assert(beta is not None and gamma is not None)
 78 |     assert(mean is not None and variance is not None)
 79 | 
 80 |     sess.run(tf.group(tf.assign(beta, module.bias.numpy()),
 81 |                       tf.assign(gamma, module.weight.numpy()),
 82 |                       tf.assign(mean, module.running_mean.numpy()),
 83 |                       tf.assign(variance, module.running_var.numpy())))
 84 | 
 85 | # helper function to load convolution weights into tensorflow variables
 86 | def load_conv_param(w, b, module):
 87 |     w = [i for i in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) if i.name == w][0]
 88 |     b = [i for i in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) if i.name == b][0]
 89 |     W = module.weight.numpy()
 90 |     W = np.transpose(W, (3, 2, 1, 0))
 91 | 
 92 |     sess.run(tf.group(tf.assign(w, W), tf.assign(b, module.bias.numpy())))
 93 | 
 94 | from scipy.misc import imread
 95 | 
 96 | img = (imread('../summer/xnor/laska.png')[:,:,:3]).astype(np.float32)
 97 | 
 98 | img = np.transpose(img, (1, 0, 2))
 99 | img = img[np.newaxis]
100 | print(img.shape)
101 | 
102 | with tf.Session() as sess:
103 |     if BWN:
104 |         load_batch_norm('BatchNorm', model.modules[0].modules[1])
105 |         load_batch_norm('BatchNorm_1', model.modules[2].modules[1])
106 |         load_batch_norm('BatchNorm_2', model.modules[4].modules[1])
107 |         load_batch_norm('BatchNorm_3', model.modules[5].modules[1])
108 |         load_batch_norm('BatchNorm_4', model.modules[6].modules[1])
109 |         load_batch_norm('BatchNorm_5', model.modules[9].modules[1])
110 |         load_batch_norm('BatchNorm_6', model.modules[11].modules[1])
111 | 
112 |         load_conv_param('Variable:0', 'Variable_1:0', model.modules[0].modules[0])
113 |         load_conv_param('Variable_2:0', 'Variable_4:0', model.modules[2].modules[0])
114 |         load_conv_param('Variable_5:0', 'Variable_7:0', model.modules[4].modules[0])
115 |         load_conv_param('Variable_8:0', 'Variable_10:0', model.modules[5].modules[0])
116 |         load_conv_param('Variable_11:0', 'Variable_13:0', model.modules[6].modules[0])
117 |         load_conv_param('Variable_14:0', 'Variable_16:0', model.modules[9].modules[0])
118 |         load_conv_param('Variable_17:0', 'Variable_19:0', model.modules[11].modules[0])
119 |         load_conv_param('Variable_20:0', 'Variable_21:0', model.modules[12])
120 |     else:
121 |         load_batch_norm('BatchNorm', model.modules[1])
122 |         load_batch_norm('BatchNorm_1', model.modules[4].modules[0])
123 |         load_batch_norm('BatchNorm_2', model.modules[5].modules[0])
124 |         load_batch_norm('BatchNorm_3', model.modules[6].modules[0])
125 |         load_batch_norm('BatchNorm_4', model.modules[7].modules[0])
126 |         load_batch_norm('BatchNorm_5', model.modules[8].modules[0])
127 |         load_batch_norm('BatchNorm_6', model.modules[9].modules[0])
128 |         load_batch_norm('BatchNorm_7', model.modules[10])
129 | 
130 |         load_conv_param('Variable:0', 'Variable_1:0', model.modules[0])
131 |         load_conv_param('Variable_2:0', 'Variable_4:0', model.modules[4].modules[2])
132 |         load_conv_param('Variable_5:0', 'Variable_7:0', model.modules[5].modules[2])
133 |         load_conv_param('Variable_8:0', 'Variable_10:0', model.modules[6].modules[2])
134 |         load_conv_param('Variable_11:0', 'Variable_13:0', model.modules[7].modules[2])
135 |         load_conv_param('Variable_14:0', 'Variable_16:0', model.modules[8].modules[2])
136 |         load_conv_param('Variable_17:0', 'Variable_19:0', model.modules[9].modules[2])
137 |         load_conv_param('Variable_20:0', 'Variable_21:0', model.modules[12])
138 | 
139 |     output = sess.run(softmax, feed_dict={x0 : img, train: False})
140 |     output = output[0,0,0,:]
141 | 
142 |     """
143 |     for i in range(3):
144 |         k = 0
145 |         for j in range(32):
146 |             if x1[0,0,0,i*32+j] < 0:
147 |                 k |= (128 >> (j % 8)) << (j // 8 * 8)
148 |         print('%X' % k)
149 | 
150 |     print(x1[0,0,0,0])
151 |     """
152 | 
153 |     order = sorted(range(len(output)), key=lambda x: output[x], reverse=True)
154 |     for i in range(5):
155 |         print(order[i], output[order[i]])
156 | 
157 |     name = 'xnornet'
158 |     if BWN:
159 |         name += '_bwn'
160 |     if quantize:
161 |         name += '_q'
162 |     tf_export.export(y, x0, name, quantize)
163 | 


--------------------------------------------------------------------------------