├── README.md
├── mnist_example.ipynb
├── mnist_model.h5
├── mnist_tsne.ipynb
├── morad_scratch.ipynb
├── tsne_clean.ipynb
├── tsne_train_only.ipynb
└── tsne_utils.py
/README.md:
--------------------------------------------------------------------------------
1 | # Signal Modulation Classification Using Machine Learning
2 | ## Morad Shefa, Gerry Zhang, Steve Croft
3 | If you want to skip all the readings and want to see what we provide and how you can use our code feel free to skip to the final section. Also, you can reach me at moradshefa@berkeley.edu
4 | ## Background
5 |
6 | A deep convolutional neural network architecture is used for signal modulation classification. There are different reasons why signal modulation classification can be important. For example, radio-frequency interference (RFI) is a major problem in radio astronomy. This is especially prevalent in SETI where RFI plagues collected data and can exhibit characteristics we look for in SETI signals.
7 | As instrumentation expands beyond frequencies allocated to radio astronomy and human generated technology fills more of the wireless spectrum classifying RFI as such becomes more important.
8 | Modulation schemes are methods of encoding information onto a high frequency carrier wave, that are more practical for transmission. Human-generated RFI tends to utilize one of a limited number of modulation schemes. Most of these methods modulate the amplitude, frequency, or phase of the carrier wave. Thus one way of classifying RFI is to classify it as a certain modulation scheme.
9 |
10 |
11 |
12 |
13 |
Examples of how information can be transmitted by changing the shape of a carrier wave. Picture credit: Tait Radio Academy
14 |
15 |
16 | ## Methods and Materials
17 | * Datasets provided by the Army Rapid Capabilities Office’s Artificial Intelligence Signal Classification challenge
18 | * Simulated signals of 24 different modulations: 16PSK, 2FSK_5KHz, 2FSK_75KHz, 8PSK, AM_DSB, AM_SSB, APSK16_c34, APSK32_c34, BPSK, CPFSK_5KHz, CPFSK_75KHz, FM_NB, FM_WB, GFSK_5KHz, GFSK_75KHz, GMSK, MSK, NOISE, OQPSK, PI4QPSK, QAM16, QAM32, QAM64, QPSK
19 | * 6 different signal to noise ratios (SNR): -10 dB, -6 dB, -2 dB, 2 dB, 6 dB, 10 dB
20 | * Used deep convolutional neural networks for classification
21 | * CNNs are widely used and have advanced performance in computer vision
22 | * Convolutions with learned filters are used to extract features in the data
23 | * Hierarchical classification: Classify into subgroups then use another classifier to identify modulation
24 | * Data augmentation: Perturbing the data during training to avoid overfit
25 | * Ensemble training: Train multiple models and average predictions
26 | * Residual Connections: Allow for deeper networks by avoiding vanishing gradients
27 |
28 |
29 |
30 |
31 | Inception Layers:
32 | * Layers with filters of different dimensions
33 | * 1x1 convolutions to reduce dimension
34 |
Picture credit: GoogLeNet
35 |
36 |
37 |
38 |
39 |
40 |
41 | Dimensionality reduction using t-distributed stochastic neighbor embedding (t-SNE) and principal component analysis (PCA) to visualize feature extraction and diagnose problems of the architecture.
42 |
These t-SNE plots helped us to evaluate our models on unlabelled test data that was distributed differently than training data.
43 |
Dimensionality reduction after extracting features of 16PSK (red), 2FSK_5kHz (green),AM_DSB (blue)
44 |
45 |
46 |
47 |
48 |
49 |
50 | Example of a vanilla convolutional neural network.
Picture credit: MDPI: "A Framework for Designing the Architectures of Deep Convolutional Neural Networks"
51 |
52 |
53 |
54 |
55 |
56 | Embedding:
57 | * Extracting output of final inception layer; 100 per modulation (dimension: 5120)
58 | * Reducing dimension using principal component analysis (dimension: 50)
59 | * Reducing dimension using t-distributed neighbor embedding (dimension: 2)
60 |
61 |
*Embedding of 24 modulations using one of our models. As we can see different modulations map to different clusters even in 2-dimensional space indicating that our model does well in extracting features that are specific to the different modulation schemes. The axis have no physical meaning. They merely represent the space found by t-SNE in which close points in high dimension stay close in lower dimension.*
62 |
63 |
64 | ## Results
65 | Results for one of our models without hierarchical inference.
66 |
67 |
68 |
69 | CNNs are able to achieve high accuracy in classification of signal modulations across different SNR values
70 |
71 |
72 |
73 |
74 |
75 |
76 |
Confusion matrices for different SNR values (-10 dB, -6 dB, 10 dB)
77 |
As SNR increases accuracies increase and more predicted labels are true labels causing stronger diagonals
78 |
79 |
80 |
81 | ## Conclusion
82 | * The ability of CNNs to classify signal modulations at high accuracy shows great promise in the future of using CNNs and other machine learning methods to classify RFI
83 | * Future work can focus on extending these methods to classify modulations in real data
84 | * One can use machine learning methods to extend these models to real data
85 | * Use domain adaptation to find performing model for a target distribution that is different from the source distribution/ training data
86 | * Label real data and train a classifier
87 |
88 |
89 | Adapting models to domains that are related but different than what was trained on is a common challenge for machine learning systems. Picture credit: Oxford Robotics Institute
90 |
91 |
92 |
93 | When the target distribution is different classification performance can suffer. Domain adaptation methods aim at finding a space in which the discrepancy is low. Picture credit: Science Direct: “Unsupervised domain adaptation techniques based on auto-encoder for non-stationary EEG-based emotion recognition”
94 |
95 |
96 | ## Provided
97 | Unfortunately, as part of the army challenge rules we are not allowed to distribute any of the provided datasets.
98 | However, we will provide:
99 | * a notebook that we used to experiment with different models and that is able to achieve
100 | our results with our data (morad_scatch.ipynb)
101 | * a notebook that builds a similar model but simplified to classify handwritten digits on the mnist dataset that achieves 99.43% accuracy (mnist_example.ipynb)
102 | * the notebook we used to get the t-SNE embeddings on training and unlabelled test data to evaluate models (tsne_clean.ipynb)
103 | * simplified code that can be used to get your own t-SNE embeddings on your own Keras models and plot them interactively using Bokeh if you desire (tsne_utils.py)
104 | * a notebook that uses tsne_utils.py and one of our models to get embeddings for signal modulation data on training data only (tsne_train_only.ipynb)
105 | * the mnist model (mnist_model.h5)
106 | * a notebook to do t-SNE on the mnist data and model (mnist_tsne.ipynb)
107 |
108 |
109 |
110 | Simple embedding of our small mnist model (no legend, no prediction probability). As we can see the data maps decently into 10 different clusters.
111 |
112 |
113 |
114 |
115 | Embedding showing the legend and the predicted probability for each point. The point over which we hover is labelled 1 with predicted probability 0.822.
116 |
--------------------------------------------------------------------------------
/mnist_model.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/moradshefa/ml_signal_modulation_classification/d476f867b7e593eb68fd84ab80f966db13bf8a44/mnist_model.h5
--------------------------------------------------------------------------------
/tsne_train_only.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook does t-SNE on the army signal classification data set using our model. \n",
8 | "It takes 100 points from each modulation from a training set which was used as a validation set during training."
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "metadata": {},
15 | "outputs": [
16 | {
17 | "name": "stderr",
18 | "output_type": "stream",
19 | "text": [
20 | "/home/morads/anaconda3/envs/py36/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
21 | " from ._conv import register_converters as _register_converters\n",
22 | "Using TensorFlow backend.\n"
23 | ]
24 | }
25 | ],
26 | "source": [
27 | "from data_loader import *\n",
28 | "from keras.models import Model, load_model\n",
29 | "from keras import backend as K\n",
30 | "from sklearn.manifold import TSNE\n",
31 | "from sklearn.decomposition import PCA\n",
32 | "\n",
33 | "from bokeh.plotting import figure, output_notebook, show, ColumnDataSource\n",
34 | "from bokeh.models import HoverTool\n",
35 | "\n",
36 | "from tsne_utils import *"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 2,
42 | "metadata": {
43 | "collapsed": true
44 | },
45 | "outputs": [],
46 | "source": [
47 | "CLASSES_24 = ['16PSK', '2FSK_5KHz', '2FSK_75KHz', '8PSK', 'AM_DSB', 'AM_SSB', 'APSK16_c34',\n",
48 | " 'APSK32_c34', 'BPSK', 'CPFSK_5KHz', 'CPFSK_75KHz', 'FM_NB', 'FM_WB',\n",
49 | " 'GFSK_5KHz', 'GFSK_75KHz', 'GMSK', 'MSK', 'NOISE', 'OQPSK', 'PI4QPSK', 'QAM16',\n",
50 | " 'QAM32', 'QAM64', 'QPSK']\n",
51 | "\n",
52 | "BOOKEH_COLORS = {\n",
53 | " '16PSK': 'aqua', \n",
54 | " '2FSK_5KHz': 'aquamarine', \n",
55 | " '2FSK_75KHz': 'bisque', \n",
56 | " '8PSK': 'black', \n",
57 | " 'AM_DSB': 'blue', \n",
58 | " 'AM_SSB':'blueviolet', \n",
59 | " 'APSK16_c34': 'brown',\n",
60 | " 'APSK32_c34': 'burlywood', \n",
61 | " 'BPSK': 'cadetblue', \n",
62 | " 'CPFSK_5KHz': 'chartreuse', \n",
63 | " 'CPFSK_75KHz': 'chocolate', \n",
64 | " 'FM_NB': 'cornflowerblue', \n",
65 | " 'FM_WB': 'crimson',\n",
66 | " 'GFSK_5KHz': 'darkcyan', \n",
67 | " 'GFSK_75KHz': 'darkgoldenrod', \n",
68 | " 'GMSK': 'darkgray', \n",
69 | " 'MSK': 'darkgreen', \n",
70 | " 'NOISE': 'darkorange', \n",
71 | " 'OQPSK': 'deeppink', \n",
72 | " 'PI4QPSK': 'fuchsia', \n",
73 | " 'QAM16': 'gold',\n",
74 | " 'QAM32': 'lightblue', \n",
75 | " 'QAM64': 'magenta', \n",
76 | " 'QPSK': 'plum'\n",
77 | "}\n",
78 | "\n",
79 | "\n",
80 | "BOOKEH_SHAPES = {\n",
81 | " '16PSK':1,\n",
82 | " '2FSK_5KHz':1,\n",
83 | " '2FSK_75KHz':1,\n",
84 | " '8PSK':1,\n",
85 | " 'AM_DSB':1,\n",
86 | " 'AM_SSB':1,\n",
87 | " 'APSK16_c34':1,\n",
88 | " 'APSK32_c34':1,\n",
89 | " 'BPSK':1,\n",
90 | " 'CPFSK_5KHz':1,\n",
91 | " 'CPFSK_75KHz':1,\n",
92 | " 'FM_NB':1,\n",
93 | " 'FM_WB':1,\n",
94 | " 'GFSK_5KHz':1,\n",
95 | " 'GFSK_75KHz':1,\n",
96 | " 'GMSK':1,\n",
97 | " 'MSK':1,\n",
98 | " 'NOISE':1,\n",
99 | " 'OQPSK':1,\n",
100 | " 'PI4QPSK':1,\n",
101 | " 'QAM16':2,\n",
102 | " 'QAM32':2,\n",
103 | " 'QAM64':2,\n",
104 | " 'QPSK':2,\n",
105 | "}"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 3,
111 | "metadata": {
112 | "collapsed": true
113 | },
114 | "outputs": [],
115 | "source": [
116 | "def load_training_data(data_file,num_samples=100, mods = None, spectrum=False):\n",
117 | " testdata = LoadModRecData(data_file, 1., 0., 0., load_snrs=[10], num_samples_per_key=num_samples, load_mods = mods,spectrum=spectrum)\n",
118 | " train_data = testdata.signalData\n",
119 | " train_labels = testdata.signalLabels[:,0]\n",
120 | " return train_data, train_labels\n"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 4,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stderr",
130 | "output_type": "stream",
131 | "text": [
132 | "/home/morads/.local/lib/python3.6/site-packages/keras/engine/saving.py:270: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.\n",
133 | " warnings.warn('No training configuration found in save file: '\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "model = load_model('modulation_classification_example_model.h5')\n",
139 | "train_file_path = \"/datax/yzhang/training_data/training_data_chunk_14.pkl\"\n",
140 | "train_file_path = \"/datax/yzhang/army_challenge/training_data/training_data_chunk_14.pkl\"\n",
141 | "\n",
142 | "num_samples_from_train = 20"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 5,
148 | "metadata": {},
149 | "outputs": [
150 | {
151 | "name": "stdout",
152 | "output_type": "stream",
153 | "text": [
154 | "[Data Loader] - Loading Datafile, /datax/yzhang/army_challenge/training_data/training_data_chunk_14.pkl (time series)\n",
155 | "[Data Loader] - Counting Number of Examples in Dataset...\n",
156 | "[Data Loader] - Number of Examples in Dataset: 480\n",
157 | "[Data Loader] - [Modulation Dataset] Adding Collects for: 16PSK\n",
158 | "[Data Loader] - [Modulation Dataset] Adding Collects for: 2FSK_5KHz\n",
159 | "[Data Loader] - [Modulation Dataset] Adding Collects for: 2FSK_75KHz\n",
160 | "[Data Loader] - [Modulation Dataset] Adding Collects for: 8PSK\n",
161 | "[Data Loader] - [Modulation Dataset] Adding Collects for: AM_DSB\n",
162 | "[Data Loader] - [Modulation Dataset] Adding Collects for: AM_SSB\n",
163 | "[Data Loader] - [Modulation Dataset] Adding Collects for: APSK16_c34\n",
164 | "[Data Loader] - [Modulation Dataset] Adding Collects for: APSK32_c34\n",
165 | "[Data Loader] - [Modulation Dataset] Adding Collects for: BPSK\n",
166 | "[Data Loader] - [Modulation Dataset] Adding Collects for: CPFSK_5KHz\n",
167 | "[Data Loader] - [Modulation Dataset] Adding Collects for: CPFSK_75KHz\n",
168 | "[Data Loader] - [Modulation Dataset] Adding Collects for: FM_NB\n",
169 | "[Data Loader] - [Modulation Dataset] Adding Collects for: FM_WB\n",
170 | "[Data Loader] - [Modulation Dataset] Adding Collects for: GFSK_5KHz\n",
171 | "[Data Loader] - [Modulation Dataset] Adding Collects for: GFSK_75KHz\n",
172 | "[Data Loader] - [Modulation Dataset] Adding Collects for: GMSK\n",
173 | "[Data Loader] - [Modulation Dataset] Adding Collects for: MSK\n",
174 | "[Data Loader] - [Modulation Dataset] Adding Collects for: NOISE\n",
175 | "[Data Loader] - [Modulation Dataset] Adding Collects for: OQPSK\n",
176 | "[Data Loader] - [Modulation Dataset] Adding Collects for: PI4QPSK\n",
177 | "[Data Loader] - [Modulation Dataset] Adding Collects for: QAM16\n",
178 | "[Data Loader] - [Modulation Dataset] Adding Collects for: QAM32\n",
179 | "[Data Loader] - [Modulation Dataset] Adding Collects for: QAM64\n",
180 | "[Data Loader] - [Modulation Dataset] Adding Collects for: QPSK\n",
181 | "[Data Loader] - Converting to numpy arrays...\n",
182 | "[Data Loader] - Shuffling Data...\n",
183 | "[Data Loader] - Splitting Data...\n",
184 | "[Data Loader] - Train Size: 480 Validation Size: 0 Test Size: 0\n",
185 | "[Data Loader] - Done.\n",
186 | "\n"
187 | ]
188 | }
189 | ],
190 | "source": [
191 | "data, labels = load_training_data(train_file_path,num_samples=num_samples_from_train,mods=None)"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": 7,
197 | "metadata": {
198 | "scrolled": false
199 | },
200 | "outputs": [
201 | {
202 | "data": {
203 | "text/html": [
204 | "\n",
205 | "
\\n\"+\n", 240 | " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", 241 | " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", 242 | " \"
\\n\"+\n", 243 | " \"\\n\"+\n",
248 | " \"from bokeh.resources import INLINE\\n\"+\n",
249 | " \"output_notebook(resources=INLINE)\\n\"+\n",
250 | " \"
\\n\"+\n",
251 | " \"