25 |
26 |
27 |
28 |
29 |
30 | ---
31 |
32 | ## Introduction
33 | This is a fun little project I made out of boredom. After seeing [Kyubyong's] text-to-speech model, I decided to create an android application that can read what I write with my own voice. If you copy the code and follow my steps, you'll be able to do the same.
34 |
35 | [Kyubyong's]: https://github.com/Kyubyong/dc_tts
36 | [Click here]: https://www.youtube.com/watch?v=NrzY_js8yZ4
37 |
38 | ## Getting Started
39 |
40 | To get a local copy up and running follow these simple steps.
41 |
42 | ### Prerequisites
43 |
44 | This is the most important part. If you want this to work, make sure you have the following things installed. Since the model uses an old version of python and tensorflow, I suggest creating a virtual environment and install everything there.
45 |
46 | * []() Python 3.6
47 | * []() Tensorflow 1.15.0
48 | * []() librosa
49 | * []() tqdm
50 | * []() matplotlib
51 | * []() scipy
52 | * []() Android Studio
53 | * []() An android phone (?)
54 | * []() Some time to lose
55 |
56 | ### Data preparation
57 |
58 | In order to clone your voice you need around 200 samples of your voice, each one between 2-10 seconds. This means that you can clone anyone's voice with only 15-20 minutes of audio, thanks to transfer learning.
59 | 1. First, you need to download the [pretrained model] if you want to make an english voice. Otherwise, find an online text-to-speech dataset of the desired language and train the model from scratch. For example, I made an italian version of my voice, starting from [this] dataset.
60 | [Here] you can download the italian pre-trained model I generated.
61 | Make sure to put the pretrained model inside the 'logdir' directory.
62 | 2. Inside LJSpeech-1.1 you have to edit the transcript.csv file to match your audio samples. Each line must have this format: , where the audio name is without the extension and the normalized sentence contains the conversion from numbers to words. Take a look at the original transcript.csv and you'll understand it easily. Then, copy your audio samples inside the wavs folder. If you want to make the data generation process less painful, I suggest writing the transcript file first, then record the sentences using record.py.
63 |
64 | [pretrained model]: https://www.dropbox.com/s/1oyipstjxh2n5wo/LJ_logdir.tar?dl=0
65 | [this]: https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/
66 | [here]: https://www.dropbox.com/s/36t6l3c1192mgw4/logdir.rar?dl=0
67 |
68 | ## Training
69 |
70 | If you want to understand how the model works, you should read [this paper]. Otherwise, treat it as a black box and mechanically follow my steps.
71 |
72 | 1. Edit hyperparams.py and make sure that prepro is set to True. Also, edit the data path to match the correct location inside your local pc. Set the batch size to 16 or 32 depending on your ram. You can also tune max_N and max_T.
73 | 2. Run prepo.py only one time. After this step you should see two new folders, 'megs' and 'mals'. If you change dataset, then delete megs and mals and run the prepo.py again.
74 | 3. Run 'python train.py 1'. This is going to take a different amount of steps for each voice, but usually after 10k steps the result should already be decent.
75 | 4. Run 'python train.py 2'. You have to train it at least 2k steps, otherwise the voice will not sound human.
76 |
77 | [this paper]: https://arxiv.org/abs/1710.08969
78 |
79 | ## Testing
80 |
81 | Open harvard_sentences.txt and edit the lines as you desire. Then, run 'python synthesize.py'. If everything is correct, a 'samples' directory should appear.
82 |
83 | ## Creating the android app
84 |
85 | As you can see, it's not very comfortable to generate the sentences. That's why I decided to make this process more user-friendly.
86 | The android app is basically just a wrapper that let you generate the audios, save them locally on the phone and share them.
87 | When you write something and press the play button in the app, the message is sent to the server.py, that launches synthesize.py and then sends the audio back to the android application.
88 | If you want to use the application outside your local network, make sure to set up the port forwarding, opening the access to the port written in the server.py. The default port is '1234'. You can change it if you want, but remember to change also the port in the MainActivity.java. You also have to set your ip address in the same file.
89 | By default the model only computes sentences shorter than 10 seconds, but in the server.py I worked around this problem by splitting the input message into small sentences, then running the synthesize on every sentence and merging the resulting audios.
90 |
91 | ## Usage
92 |
93 | 1. Import the Android_App folder into Android Studio and edit the ip address to match your ip in MainActivity.java.
94 | 2. Run 'python server.py' on your local pc, then leave it on for as long as you need.
95 |
96 | ## Notes
97 | * []() In case something is not clear or you bump into some weird error, don't be afraid to ask.
98 | * []() This is my first android project, I had no prior experience on mobile development. So the code is probably not optimal, but it works.
99 | * []() The application runs both an Italian and English model because I have cloned my voice in both languages. I think the code still works with one model without any tweaks though.
100 |
--------------------------------------------------------------------------------
/Android_App/app/src/main/java/com/android/simsax/MainActivity.java:
--------------------------------------------------------------------------------
1 | package com.android.simsax;
2 |
3 | import android.content.Intent;
4 | import android.graphics.Color;
5 | import android.media.MediaPlayer;
6 | import android.os.Bundle;
7 | import android.util.Log;
8 | import android.view.Gravity;
9 | import android.view.Menu;
10 | import android.view.MenuInflater;
11 | import android.view.MenuItem;
12 | import android.view.View;
13 | import android.widget.ImageButton;
14 | import android.widget.ProgressBar;
15 | import android.widget.Toast;;
16 | import androidx.annotation.NonNull;
17 | import androidx.appcompat.app.AppCompatActivity;
18 | import androidx.appcompat.widget.Toolbar;
19 |
20 | import com.google.android.material.textfield.TextInputEditText;
21 | import com.google.android.material.textfield.TextInputLayout;
22 | import java.io.BufferedInputStream;
23 | import java.io.DataInputStream;
24 | import java.io.File;
25 | import java.io.FileOutputStream;
26 | import java.io.IOException;
27 | import java.io.PrintWriter;
28 | import java.net.ConnectException;
29 | import java.net.InetSocketAddress;
30 | import java.net.Socket;
31 | import java.net.SocketTimeoutException;
32 | import java.text.SimpleDateFormat;
33 | import java.util.Date;
34 |
35 | public class MainActivity extends AppCompatActivity {
36 |
37 | Toolbar toolbar;
38 | ImageButton playButton;
39 | ImageButton flagButton;
40 | ProgressBar pb;
41 | boolean ita;
42 |
43 | @Override
44 | protected void onCreate(Bundle savedInstanceState) {
45 | super.onCreate(savedInstanceState);
46 | setContentView(R.layout.activity_main);
47 | toolbar = findViewById(R.id.main_toolbar);
48 | toolbar.setTitleTextColor(Color.WHITE);
49 | setSupportActionBar(toolbar);
50 | pb = findViewById(R.id.progressBar);
51 | pb.setVisibility(View.INVISIBLE);
52 | ita = true;
53 | }
54 |
55 | public void playButtonPressed(View v) throws IOException {
56 | playButton = (ImageButton) v;
57 | TextInputLayout til = findViewById(R.id.text_input);
58 | String msg = til.getEditText().getText().toString();
59 | if (!msg.isEmpty()) {
60 | ConnectionThread connect = new ConnectionThread(msg);
61 | connect.start();
62 | playButton.setVisibility(View.INVISIBLE);
63 | pb.setVisibility(View.VISIBLE);
64 | }
65 | }
66 |
67 | @Override
68 | public boolean onCreateOptionsMenu(Menu menu) {
69 | MenuInflater inflater = getMenuInflater();
70 | inflater.inflate(R.menu.menu, menu);
71 | return true;
72 | }
73 |
74 | @Override
75 | public boolean onOptionsItemSelected(@NonNull MenuItem item) {
76 | switch (item.getItemId()) {
77 | case R.id.recordings:
78 | Intent i = new Intent(this, RecordingsActivity.class);
79 | startActivity(i);
80 | return true;
81 | //case R.id.settings:
82 | // return true;
83 | }
84 | return super.onOptionsItemSelected(item);
85 | }
86 |
87 | public void flagPressed(View view) {
88 | flagButton = (ImageButton) view;
89 | TextInputEditText text = findViewById(R.id.text_input_edit);
90 | ita = !ita;
91 | if (ita) {
92 | text.setHint("Scrivi qualcosa...");
93 | flagButton.setImageDrawable(getResources().getDrawable(R.drawable.ic_italy));
94 | } else {
95 | text.setHint("Write something...");
96 | flagButton.setImageDrawable(getResources().getDrawable(R.drawable.ic_united_kingdom));
97 | }
98 | }
99 |
100 |
101 | private class ConnectionThread extends Thread {
102 | private String msg;
103 |
104 | ConnectionThread(String msg) {
105 | this.msg = msg;
106 | }
107 |
108 | @Override
109 | public void run() {
110 | Socket s = null;
111 | try {
112 | s = new Socket();
113 | s.connect(new InetSocketAddress("youripaddress", 1234), 3000);
114 | }
115 | catch (IOException e) {
116 | runOnUiThread(new Runnable() {
117 | @Override
118 | public void run() {
119 | Toast toast = Toast.makeText(getApplicationContext(), "Can't connect to the server", Toast.LENGTH_LONG);
120 | toast.setGravity(Gravity.CENTER_VERTICAL, 0, -400);
121 | toast.show();
122 | pb.setVisibility(View.INVISIBLE);
123 | playButton.setVisibility(View.VISIBLE);
124 | }
125 | });
126 | return;
127 | }
128 | try {
129 | PrintWriter pr = new PrintWriter(s.getOutputStream(), true);
130 | if (ita) {
131 | msg = "ITA" + msg;
132 | } else {
133 | msg = "ENG" + msg;
134 | }
135 | pr.print(msg);
136 | Log.d("msg_sent", msg);
137 | pr.flush();
138 | s.setSoTimeout(120 * 1000); // 2 minutes max to let the server neural network do the feed forward
139 | receiveAudio(s);
140 | s.close();
141 | runOnUiThread(new Runnable() {
142 | @Override
143 | public void run() {
144 | pb.setVisibility(View.INVISIBLE);
145 | playButton.setVisibility(View.VISIBLE);
146 | }
147 | });
148 | } catch (IOException e) {
149 | Log.d("ERROR", e.toString());
150 | runOnUiThread(new Runnable() {
151 | @Override
152 | public void run() {
153 | Toast toast = Toast.makeText(getApplicationContext(), "The server is taking too long to respond", Toast.LENGTH_LONG);
154 | toast.setGravity(Gravity.CENTER_VERTICAL, 0, -400);
155 | toast.show();
156 | }
157 | });
158 | return;
159 | }
160 | }
161 |
162 | private void receiveAudio(Socket s) throws IOException {
163 | DataInputStream in = new DataInputStream(new BufferedInputStream(s.getInputStream()));
164 | byte[] msgByte = new byte[10000000]; // 10 MB (it would be better if the server sent a header in order to calculate the length, but I was lazy)
165 | String directoryToStore = getExternalFilesDir("/").getAbsolutePath();
166 |
167 | SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy_MM_dd_hh_mm_ss");
168 | String timestamp = simpleDateFormat.format(new Date());
169 | File dstFile = new File(directoryToStore + "/" + timestamp + ".wav");
170 | FileOutputStream out = new FileOutputStream(dstFile);
171 |
172 | int len;
173 | while ((len = in.read(msgByte)) > 0) {
174 | out.write(msgByte, 0, len);
175 | }
176 | out.close();
177 |
178 | MediaPlayer mediaPlayer = new MediaPlayer();
179 | mediaPlayer.setDataSource(dstFile.getAbsolutePath());
180 | mediaPlayer.prepare();
181 | mediaPlayer.start();
182 | }
183 | }
184 | }
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # /usr/bin/python2
3 | '''
4 | By kyubyong park. kbpark.linguist@gmail.com.
5 | https://www.github.com/kyubyong/dc_tts
6 | '''
7 |
8 | from __future__ import print_function
9 |
10 | from tqdm import tqdm
11 |
12 | from data_load import get_batch, load_vocab
13 | from hyperparams import Hyperparams as hp
14 | from modules import *
15 | from networks import TextEnc, AudioEnc, AudioDec, Attention, SSRN
16 | import tensorflow as tf
17 | from utils import *
18 | import sys
19 | import os
20 | class Graph:
21 | def __init__(self, num=1, mode="train"):
22 | '''
23 | Args:
24 | num: Either 1 or 2. 1 for Text2Mel 2 for SSRN.
25 | mode: Either "train" or "synthesize".
26 | '''
27 | # Load vocabulary
28 | self.char2idx, self.idx2char = load_vocab()
29 |
30 | # Set flag
31 | training = True if mode=="train" else False
32 |
33 | # Graph
34 | # Data Feeding
35 | ## L: Text. (B, N), int32
36 | ## mels: Reduced melspectrogram. (B, T/r, n_mels) float32
37 | ## mags: Magnitude. (B, T, n_fft//2+1) float32
38 | if mode=="train":
39 | self.L, self.mels, self.mags, self.fnames, self.num_batch = get_batch()
40 | self.prev_max_attentions = tf.ones(shape=(hp.B,), dtype=tf.int32)
41 | self.gts = tf.convert_to_tensor(guided_attention())
42 | else: # Synthesize
43 | self.L = tf.placeholder(tf.int32, shape=(None, None))
44 | self.mels = tf.placeholder(tf.float32, shape=(None, None, hp.n_mels))
45 | self.prev_max_attentions = tf.placeholder(tf.int32, shape=(None,))
46 |
47 | if num==1 or (not training):
48 | with tf.variable_scope("Text2Mel"):
49 | # Get S or decoder inputs. (B, T//r, n_mels)
50 | self.S = tf.concat((tf.zeros_like(self.mels[:, :1, :]), self.mels[:, :-1, :]), 1)
51 |
52 | # Networks
53 | with tf.variable_scope("TextEnc"):
54 | self.K, self.V = TextEnc(self.L, training=training) # (N, Tx, e)
55 |
56 | with tf.variable_scope("AudioEnc"):
57 | self.Q = AudioEnc(self.S, training=training)
58 |
59 | with tf.variable_scope("Attention"):
60 | # R: (B, T/r, 2d)
61 | # alignments: (B, N, T/r)
62 | # max_attentions: (B,)
63 | self.R, self.alignments, self.max_attentions = Attention(self.Q, self.K, self.V,
64 | mononotic_attention=(not training),
65 | prev_max_attentions=self.prev_max_attentions)
66 | with tf.variable_scope("AudioDec"):
67 | self.Y_logits, self.Y = AudioDec(self.R, training=training) # (B, T/r, n_mels)
68 | else: # num==2 & training. Note that during training,
69 | # the ground truth melspectrogram values are fed.
70 | with tf.variable_scope("SSRN"):
71 | self.Z_logits, self.Z = SSRN(self.mels, training=training)
72 |
73 | if not training:
74 | # During inference, the predicted melspectrogram values are fed.
75 | with tf.variable_scope("SSRN"):
76 | self.Z_logits, self.Z = SSRN(self.Y, training=training)
77 |
78 | with tf.variable_scope("gs"):
79 | self.global_step = tf.Variable(0, name='global_step', trainable=False)
80 |
81 | if training:
82 | if num==1: # Text2Mel
83 | # mel L1 loss
84 | self.loss_mels = tf.reduce_mean(tf.abs(self.Y - self.mels))
85 |
86 | # mel binary divergence loss
87 | self.loss_bd1 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.Y_logits, labels=self.mels))
88 |
89 | # guided_attention loss
90 | self.A = tf.pad(self.alignments, [(0, 0), (0, hp.max_N), (0, hp.max_T)], mode="CONSTANT", constant_values=-1.)[:, :hp.max_N, :hp.max_T]
91 | self.attention_masks = tf.to_float(tf.not_equal(self.A, -1))
92 | self.loss_att = tf.reduce_sum(tf.abs(self.A * self.gts) * self.attention_masks)
93 | self.mask_sum = tf.reduce_sum(self.attention_masks)
94 | self.loss_att /= self.mask_sum
95 |
96 | # total loss
97 | self.loss = self.loss_mels + self.loss_bd1 + self.loss_att
98 |
99 | tf.summary.scalar('train/loss_mels', self.loss_mels)
100 | tf.summary.scalar('train/loss_bd1', self.loss_bd1)
101 | tf.summary.scalar('train/loss_att', self.loss_att)
102 | tf.summary.image('train/mel_gt', tf.expand_dims(tf.transpose(self.mels[:1], [0, 2, 1]), -1))
103 | tf.summary.image('train/mel_hat', tf.expand_dims(tf.transpose(self.Y[:1], [0, 2, 1]), -1))
104 | else: # SSRN
105 | # mag L1 loss
106 | self.loss_mags = tf.reduce_mean(tf.abs(self.Z - self.mags))
107 |
108 | # mag binary divergence loss
109 | self.loss_bd2 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.Z_logits, labels=self.mags))
110 |
111 | # total loss
112 | self.loss = self.loss_mags + self.loss_bd2
113 |
114 | tf.summary.scalar('train/loss_mags', self.loss_mags)
115 | tf.summary.scalar('train/loss_bd2', self.loss_bd2)
116 | tf.summary.image('train/mag_gt', tf.expand_dims(tf.transpose(self.mags[:1], [0, 2, 1]), -1))
117 | tf.summary.image('train/mag_hat', tf.expand_dims(tf.transpose(self.Z[:1], [0, 2, 1]), -1))
118 |
119 | # Training Scheme
120 | self.lr = learning_rate_decay(hp.lr, self.global_step)
121 | self.optimizer = tf.train.AdamOptimizer(learning_rate=self.lr)
122 | tf.summary.scalar("lr", self.lr)
123 |
124 | ## gradient clipping
125 | self.gvs = self.optimizer.compute_gradients(self.loss)
126 | self.clipped = []
127 | for grad, var in self.gvs:
128 | grad = tf.clip_by_value(grad, -1., 1.)
129 | self.clipped.append((grad, var))
130 | self.train_op = self.optimizer.apply_gradients(self.clipped, global_step=self.global_step)
131 |
132 | # Summary
133 | self.merged = tf.summary.merge_all()
134 |
135 |
136 | if __name__ == '__main__':
137 | # argument: 1 or 2. 1 for Text2mel, 2 for SSRN.
138 | num = int(sys.argv[1])
139 |
140 | g = Graph(num=num); print("Training Graph loaded")
141 |
142 | logdir = hp.logdir + "-" + str(num) + hp.lang
143 | try:
144 | os.mkdir(logdir)
145 | print(f"Created Folder for {hp.lang}")
146 | except OSError as error:
147 | print(error)
148 | sv = tf.train.Supervisor(logdir=logdir, save_model_secs=0, global_step=g.global_step)
149 | with sv.managed_session() as sess:
150 | for i in range(0,hp.num_iterations):
151 | print(f"Step {i+1}")
152 | for _ in tqdm(range(g.num_batch), total=g.num_batch, ncols=70, leave=False, unit='b'):
153 | gs, _ = sess.run([g.global_step, g.train_op])
154 |
155 | # Write checkpoint files at every 1k steps
156 | if gs % 1000 == 0:
157 | print("Reached 1k")
158 | sv.saver.save(sess, logdir + '/model_gs_{}'.format(str(gs // 1000).zfill(3) + "k"))
159 |
160 | if num==1:
161 | # plot alignment
162 | alignments = sess.run(g.alignments)
163 | plot_alignment(alignments[0], str(gs // 1000).zfill(3) + "k", logdir)
164 |
165 | print("Done")
166 |
--------------------------------------------------------------------------------
/Android_App/app/src/main/java/com/android/simsax/AudioListAdapter.java:
--------------------------------------------------------------------------------
1 | package com.android.simsax;
2 |
3 | import android.app.Activity;
4 | import android.graphics.Color;
5 | import android.view.ActionMode;
6 | import android.view.LayoutInflater;
7 | import android.view.Menu;
8 | import android.view.MenuInflater;
9 | import android.view.MenuItem;
10 | import android.view.View;
11 | import android.view.ViewGroup;
12 | import android.widget.ImageView;
13 | import android.widget.TextView;
14 | import androidx.annotation.NonNull;
15 | import androidx.appcompat.app.AppCompatActivity;
16 | import androidx.fragment.app.FragmentActivity;
17 | import androidx.lifecycle.LifecycleOwner;
18 | import androidx.lifecycle.Observer;
19 | import androidx.lifecycle.ViewModelProviders;
20 | import androidx.recyclerview.widget.RecyclerView;
21 | import java.util.ArrayList;
22 |
23 | public class AudioListAdapter extends RecyclerView.Adapter {
24 |
25 | private ArrayList exampleList;
26 | private OnItemClickListener mListener;
27 | private boolean isEnable = false;
28 | private boolean allSelected = false;
29 | private ArrayList selected = new ArrayList<>();
30 | private ArrayList selectedItems = new ArrayList<>();
31 | private MainViewModel mainViewModel;
32 | private Activity activity;
33 |
34 |
35 | public class AudioViewHolder extends RecyclerView.ViewHolder {
36 |
37 | public ImageView buttonPlay;
38 | public TextView fileName;
39 | public ImageView shareButton;
40 |
41 | public AudioViewHolder(@NonNull View itemView, OnItemClickListener listener) {
42 | super(itemView);
43 |
44 | buttonPlay = itemView.findViewById(R.id.button1);
45 | fileName = itemView.findViewById(R.id.fileName1);
46 | shareButton = itemView.findViewById(R.id.share_button);
47 |
48 | itemView.setOnClickListener(new View.OnClickListener() {
49 | @Override
50 | public void onClick(View v) {
51 | if (listener != null) {
52 | if (!isEnable) {
53 | int position = getAdapterPosition();
54 | if (position != RecyclerView.NO_POSITION) {
55 | listener.onItemClick(position);
56 | }
57 | } else { // Action mode
58 | ClickItem(itemView, getAdapterPosition());
59 | }
60 | }
61 | }
62 | });
63 |
64 | shareButton.setOnClickListener(new View.OnClickListener() {
65 | @Override
66 | public void onClick(View v) {
67 | if (listener != null) {
68 | if (!isEnable) {
69 | int position = getAdapterPosition();
70 | if (position != RecyclerView.NO_POSITION) {
71 | listener.onShareClick(position);
72 | }
73 | } else { // Action mode
74 | ClickItem(itemView, getAdapterPosition());
75 | }
76 | }
77 | }
78 | });
79 | }
80 | }
81 |
82 | public AudioListAdapter(Activity activity, ArrayList exampleList) {
83 | this.exampleList = exampleList;
84 | this.activity = activity;
85 | for (int i=0; i() {
122 | @Override
123 | public void onChanged(String s) {
124 | if (s.equals("0"))
125 | mode.finish();
126 | else
127 | mode.setTitle(String.format("%s", s));
128 | }
129 | });
130 | return true;
131 | }
132 |
133 | @Override
134 | public boolean onActionItemClicked(ActionMode mode, MenuItem item) {
135 | int id = item.getItemId();
136 |
137 | if (id == R.id.menu_delete) {
138 | for (ExampleRow f : selectedItems) {
139 | exampleList.remove(f);
140 | f.getFile().delete();
141 | }
142 | mode.finish();
143 | } else {
144 | if (selectedItems.size() == exampleList.size()) {
145 | selectedItems.clear();
146 | for (int i=0; i (B, T/r, c)
226 | tensor = conv1d(Y,
227 | filters=hp.c,
228 | size=1,
229 | rate=1,
230 | dropout_rate=hp.dropout_rate,
231 | training=training,
232 | scope="C_{}".format(i)); i += 1
233 | for j in range(2):
234 | tensor = hc(tensor,
235 | size=3,
236 | rate=3**j,
237 | dropout_rate=hp.dropout_rate,
238 | training=training,
239 | scope="HC_{}".format(i)); i += 1
240 | for _ in range(2):
241 | # -> (B, T/2, c) -> (B, T, c)
242 | tensor = conv1d_transpose(tensor,
243 | scope="D_{}".format(i),
244 | dropout_rate=hp.dropout_rate,
245 | training=training,); i += 1
246 | for j in range(2):
247 | tensor = hc(tensor,
248 | size=3,
249 | rate=3**j,
250 | dropout_rate=hp.dropout_rate,
251 | training=training,
252 | scope="HC_{}".format(i)); i += 1
253 | # -> (B, T, 2*c)
254 | tensor = conv1d(tensor,
255 | filters=2*hp.c,
256 | size=1,
257 | rate=1,
258 | dropout_rate=hp.dropout_rate,
259 | training=training,
260 | scope="C_{}".format(i)); i += 1
261 | for _ in range(2):
262 | tensor = hc(tensor,
263 | size=3,
264 | rate=1,
265 | dropout_rate=hp.dropout_rate,
266 | training=training,
267 | scope="HC_{}".format(i)); i += 1
268 | # -> (B, T, 1+n_fft/2)
269 | tensor = conv1d(tensor,
270 | filters=1+hp.n_fft//2,
271 | size=1,
272 | rate=1,
273 | dropout_rate=hp.dropout_rate,
274 | training=training,
275 | scope="C_{}".format(i)); i += 1
276 |
277 | for _ in range(2):
278 | tensor = conv1d(tensor,
279 | size=1,
280 | rate=1,
281 | dropout_rate=hp.dropout_rate,
282 | activation_fn=tf.nn.relu,
283 | training=training,
284 | scope="C_{}".format(i)); i += 1
285 | logits = conv1d(tensor,
286 | size=1,
287 | rate=1,
288 | dropout_rate=hp.dropout_rate,
289 | training=training,
290 | scope="C_{}".format(i))
291 | Z = tf.nn.sigmoid(logits)
292 | return logits, Z
293 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/LJSpeech-1.1/transcript.csv:
--------------------------------------------------------------------------------
1 | parte_1_000|In this book, we've focused on the nuts and bolts of neural networks|In this book, we've focused on the nuts and bolts of neural networks;;;
2 | parte_1_001|how they work, and how they can be used to solve pattern recognition problems.|how they work, and how they can be used to solve pattern recognition problems.;;;
3 | parte_1_002|This is material with many immediate practical applications.|This is material with many immediate practical applications.;;;
4 | parte_1_003|But, of course, one reason for interest in neural nets|But, of course, one reason for interest in neural nets;;;
5 | parte_1_004|is the hope that one day they will go far beyond such basic pattern recognition problems.|is the hope that one day they will go far beyond such basic pattern recognition problems.;;;
6 | parte_1_005|Perhaps they, or some other approach based on digital computers,|Perhaps they, or some other approach based on digital computers,;;;
7 | parte_1_105|will eventually be used to build thinking machines, machines that match or surpass human intelligence|will eventually be used to build thinking machines, machines that match or surpass human intelligence;;;
8 | parte_1_006|This notion far exceeds the material discussed in the book - or what anyone in the world knows how to do.|This notion far exceeds the material discussed in the book - or what anyone in the world knows how to do.;;;
9 | parte_1_106|But it's fun to speculate.|But it's fun to speculate.;;;
10 | parte_1_007|There has been much debate about whether it's even possible for computers to match human intelligence.|There has been much debate about whether it's even possible for computers to match human intelligence.;;;
11 | parte_1_008|I'm not going to engage with that question.|I'm not going to engage with that question.;;;
12 | parte_1_108|Despite ongoing dispute, I believe it's not in serious doubt that an intelligent computer is possible|Despite ongoing dispute, I believe it's not in serious doubt that an intelligent computer is possible;;;
13 | parte_1_009|and perhaps far beyond current technology|and perhaps far beyond current technology;;;
14 | parte_1_010|and current naysayers will one day seem much more like the vitalists.|and current naysayers will one day seem much more like the vitalists.;;;
15 | parte_1_011|The idea that there is a truly simple algorithm for intelligence is a bold idea.|The idea that there is a truly simple algorithm for intelligence is a bold idea.;;;
16 | parte_1_111|It perhaps sounds too optimistic to be true.|It perhaps sounds too optimistic to be true.;;;
17 | parte_1_211|Many people have a strong intuitive sense that intelligence has considerable irreducible complexity.|Many people have a strong intuitive sense that intelligence has considerable irreducible complexity.;;;
18 | parte_1_012|They're so impressed by the amazing variety and flexibility of human thought|They're so impressed by the amazing variety and flexibility of human thought;;;
19 | parte_1_112|that they conclude that a simple algorithm for intelligence must be impossible.|that they conclude that a simple algorithm for intelligence must be impossible.;;;
20 | parte_1_212|Despite this intuition, I don't think it's wise to rush to judgement.|Despite this intuition, I don't think it's wise to rush to judgement.;;;
21 | parte_1_312|The history of science is filled with instances where a phenomenon initially appeared extremely complex,|The history of science is filled with instances where a phenomenon initially appeared extremely complex,;;;
22 | parte_1_412|but was later explained by some simple but powerful set of ideas.|but was later explained by some simple but powerful set of ideas.;;;
23 | parte_1_013|Consider, for example, the early days of astronomy.|Consider, for example, the early days of astronomy.;;;
24 | parte_1_014|Humans have known since ancient times that there is a menagerie of objects in the sky|Humans have known since ancient times that there is a menagerie of objects in the sky;;;
25 | parte_1_114|the sun, the moon, the planets, the comets, and the stars.|the sun, the moon, the planets, the comets, and the stars.;;;
26 | parte_1_214|These objects behave in very different ways|These objects behave in very different ways;;;
27 | parte_1_314|streak across the sky, and then disappear.|streak across the sky, and then disappear.;;;
28 | parte_1_015|But in the 17th century Newton formulated his theory of universal gravitation,|But in the seventeenth century Newton formulated his theory of universal gravitation,;;;
29 | parte_1_115|which not only explained all these motions, but also explained terrestrial phenomena|which not only explained all these motions, but also explained terrestrial phenomena;;;
30 | parte_1_016|The 16th century's foolish optimist seems in retrospect like a pessimist, asking for too little.|The sixteenth century's foolish optimist seems in retrospect like a pessimist, asking for too little.;;;
31 | parte_1_017|Of course, science contains many more such examples.|Of course, science contains many more such examples.;;;
32 | parte_1_018|Or the puzzle of how there is so much complexity and diversity in the biological world,|Or the puzzle of how there is so much complexity and diversity in the biological world,;;;
33 | parte_1_118|whose origin turns out to lie in the principle of evolution by natural selection.|whose origin turns out to lie in the principle of evolution by natural selection.;;;
34 | parte_1_218|These and many other examples suggest that it would not be wise to rule out a simple explanation of intelligence|These and many other examples suggest that it would not be wise to rule out a simple explanation of intelligence;;;
35 | parte_1_318|merely on the grounds that what our brains - currently the best examples of intelligence - are doing appears to be very complicated.|merely on the grounds that what our brains - currently the best examples of intelligence - are doing appears to be very complicated.;;;
36 | parte_2_000|Contrariwise, and despite these optimistic examples|Contrariwise, and despite these optimistic examples;;;
37 | parte_2_001|it is also logically possible that intelligence can only be explained|it is also logically possible that intelligence can only be explained;;;
38 | parte_2_100|by a large number of fundamentally distinct mechanisms|by a large number of fundamentally distinct mechanisms;;;
39 | parte_2_002|In the case of our brains, those many mechanisms may perhaps have evolved in response to many different selection|In the case of our brains, those many mechanisms may perhaps have evolved in response to many different selection;;;
40 | parte_2_102|in our species' evolutionary history.|in our species' evolutionary history.;;;
41 | parte_2_003|If this point of view is correct, then intelligence involves considerable irreducible complexity, and no simple algorithm for intelligence is possible.|If this point of view is correct, then intelligence involves considerable irreducible complexity, and no simple algorithm for intelligence is possible.;;;
42 | parte_2_004|Which of these two points of view is correct?|Which of these two points of view is correct?;;;
43 | parte_2_005|To get insight into this question, let's ask a closely related question,|To get insight into this question, let's ask a closely related question,;;;
44 | parte_2_105|which is whether there's a simple explanation of how human brains work.|which is whether there's a simple explanation of how human brains work.;;;
45 | parte_2_006|In particular, let's look at some ways of quantifying the complexity of the brain.|In particular, let's look at some ways of quantifying the complexity of the brain.;;;
46 | parte_2_007|Our first approach is the view of the brain from connectomics.|Our first approach is the view of the brain from connectomics.;;;
47 | parte_2_008|This is all about the raw wiring|This is all about the raw wiring;;;
48 | parte_2_009|how many neurons there are in the brain, how many glial cells, and how many connections there are between the neurons.|how many neurons there are in the brain, how many glial cells, and how many connections there are between the neurons.;;;
49 | parte_2_010|You've probably heard the numbers before|You've probably heard the numbers before;;;
50 | parte_2_110|the brain contains on the order of 100 billion neurons, 100 billion glial cells|the brain contains on the order of a hundred billion neurons, a hundred billion glial cells;;;
51 | parte_2_210|Those numbers are staggering. They're also intimidating.|Those numbers are staggering. They're also intimidating.;;;
52 | parte_2_012|There's a second, more optimistic point of view, the view of the brain from molecular biology.|There's a second, more optimistic point of view, the view of the brain from molecular biology.;;;
53 | parte_2_011|in order to understand how the brain works,|in order to understand how the brain works,;;;
54 | parte_2_111|then we're certainly not going to end up with a simple algorithm for intelligence.|then we're certainly not going to end up with a simple algorithm for intelligence.;;;
55 | parte_2_012|There's a second, more optimistic point of view, the view of the brain from molecular biology.|There's a second, more optimistic point of view, the view of the brain from molecular biology.;;;
56 | parte_2_013|The idea is to ask how much genetic information is needed to describe the brain's architecture.|The idea is to ask how much genetic information is needed to describe the brain's architecture.;;;
57 | parte_2_014|To get a handle on this question, we'll start by considering the genetic differences between humans and chimpanzees.|To get a handle on this question, we'll start by considering the genetic differences between humans and chimpanzees.;;;
58 | parte_2_016|This saying is sometimes varied - popular variations also give the number as 95 or 99 percent.|This saying is sometimes varied - popular variations also give the number as ninety-five or ninety-nine percent.;;;
59 | parte_2_017|The variations occur because the numbers were originally estimated by comparing samples of the human and chimp genomes,|The variations occur because the numbers were originally estimated by comparing samples of the human and chimp genomes,;;;
60 | parte_2_117|not the entire genomes.|not the entire genomes.;;;
61 | parte3_001|In the last few paragraphs I've ignored the fact that 125 million bits merely quantifies the genetic difference between human and chimp brains.|In the last few paragraphs I've ignored the fact that 125 million bits merely quantifies the genetic difference between human and chimp brains.;;;
62 | parte3_002|Not all of our brain function is due to those 125 million bits.|Not all of our brain function is due to those one twenty-five million bits.;;;
63 | parte3_003|Chimps are remarkable thinkers in their own right.|Chimps are remarkable thinkers in their own right.;;;
64 | parte3_004|Maybe the key to intelligence lies mostly in the mental abilities|Maybe the key to intelligence lies mostly in the mental abilities;;;
65 | parte3_104|and genetic information|and genetic information;;;
66 | parte3_005|If this is correct, then human brains might be just a minor upgrade to chimpanzee brains|If this is correct, then human brains might be just a minor upgrade to chimpanzee brains;;;
67 | parte3_105|at least in terms of the complexity of the underlying principles.|at least in terms of the complexity of the underlying principles.;;;
68 | parte3_006|Despite the conventional human chauvinism about our unique capabilities, this isn't inconceivable|Despite the conventional human chauvinism about our unique capabilities, this isn't inconceivable;;;
69 | parte3_106|the chimpanzee and human genetic lines diverged just 5 million years ago|the chimpanzee and human genetic lines diverged just five million years ago;;;
70 | parte3_206|a blink in evolutionary timescales.|a blink in evolutionary timescales.;;;
71 | parte3_010|in the absence of a more compelling argument, I'm sympathetic to the conventional human chauvinism|in the absence of a more compelling argument, I'm sympathetic to the conventional human chauvinism;;;
72 | parte3_110|my guess is that the most interesting principles|my guess is that the most interesting principles;;;
73 | parte3_210|underlying human thought|underlying human thought;;;
74 | parte3_310|lie in that 125 million bits|lie in that one twenty-five million bits;;;
75 | parte3_011|Adopting the view of the brain from molecular biology gave us a reduction of roughly nine orders of magnitude in the complexity of our description.|Adopting the view of the brain from molecular biology gave us a reduction of roughly nine orders of magnitude in the complexity of our description.;;;
76 | parte3_012|While encouraging, it doesn't tell us whether or not a truly simple algorithm for intelligence is possible.|While encouraging, it doesn't tell us whether or not a truly simple algorithm for intelligence is possible.;;;
77 | parte3_013|Can we get any further reductions in complexity?|Can we get any further reductions in complexity?;;;
78 | parte3_014|And, more to the point, can we settle the question of whether a simple algorithm for intelligence is possible?|And, more to the point, can we settle the question of whether a simple algorithm for intelligence is possible?;;;
79 | parte3_016|Among the evidence suggesting that there may be a simple algorithm for intelligence|Among the evidence suggesting that there may be a simple algorithm for intelligence;;;
80 | parte3_116|is an experiment reported in April 2000 in the journal Nature.| is an experiment reported in April two thousands in the journal Nature.;;;
81 | parte3_021|The visual cortex contains many orientation columns.|The visual cortex contains many orientation columns.;;;
82 | parte3_022|These are little slabs of neurons, each of which responds to visual stimuli from some particular direction.|These are little slabs of neurons, each of which responds to visual stimuli from some particular direction.;;;
83 | parte3_023|You can think of the orientation columns|You can think of the orientation columns;;;
84 | parte3_123|when someone shines a bright light from some particular direction, a corresponding orientation column is activated.|when someone shines a bright light from some particular direction, a corresponding orientation column is activated.;;;
85 | parte3_024|If the light is moved, a different orientation column is activated.|If the light is moved, a different orientation column is activated.;;;
86 | parte3_025|which charts how the orientation columns are laid out.|which charts how the orientation columns are laid out.;;;
87 | parte4_000|What the scientists found is that|What the scientists found is that;;;
88 | parte4_001|was rerouted to the auditory cortex,|was rerouted to the auditory cortex,;;;
89 | parte4_002|the auditory cortex changed.|the auditory cortex changed.;;;
90 | parte4_003|Orientation columns and an orientation map began to emerge in the auditory cortex.|Orientation columns and an orientation map began to emerge in the auditory cortex.;;;
91 | parte4_004|It was more disorderly than the orientation map usually found in the visual cortex,|It was more disorderly than the orientation map usually found in the visual cortex,;;;
92 | parte4_005|but unmistakably similar.|but unmistakably similar.;;;
93 | parte4_006|Furthermore, the scientists did some simple tests of how the ferrets responded to visual stimuli|Furthermore, the scientists did some simple tests of how the ferrets responded to visual stimuli;;;
94 | parte4_007|training them to respond differently when lights flashed from different directions.|training them to respond differently when lights flashed from different directions.;;;
95 | parte4_008|These tests suggested that the ferrets could still learn to "see",|These tests suggested that the ferrets could still learn to "see",;;;
96 | parte4_009|at least in a rudimentary fashion,|at least in a rudimentary fashion,;;;
97 | parte4_010|This is an astonishing result.|This is an astonishing result.;;;
98 | parte4_011|It suggests that there are common principles underlying how different parts of the brain|It suggests that there are common principles underlying how different parts of the brain;;;
99 | parte4_012|learn to respond to sensory data.|learn to respond to sensory data.;;;
100 | parte4_013|That commonality provides at least some support for the idea that there is a set of simple principles|That commonality provides at least some support for the idea that there is a set of simple principles;;;
101 | parte4_014|underlying intelligence. However, we shouldn't kid ourselves about how good the ferrets' vision was in these experiments.|underlying intelligence. However, we shouldn't kid ourselves about how good the ferrets' vision was in these experiments.;;;
102 | parte4_015|The behavioural tests tested only very gross aspects of vision.|The behavioural tests tested only very gross aspects of vision.;;;
103 | parte4_016|And, of course, we can't ask the ferrets if they've "learned to see".|And, of course, we can't ask the ferrets if they've "learned to see".;;;
104 | parte4_017|So the experiments don't prove that the rewired auditory cortex was giving the ferrets a high-fidelity visual experience.|So the experiments don't prove that the rewired auditory cortex was giving the ferrets a high-fidelity visual experience.;;;
105 | parte4_018|And so they provide only limited evidence in favour of the idea|And so they provide only limited evidence in favour of the idea;;;
106 | parte4_019|that common principles underlie how different parts of the brain learn.|that common principles underlie how different parts of the brain learn.;;;
107 | parte4_020|What evidence is there against the idea of a simple algorithm for intelligence?|What evidence is there against the idea of a simple algorithm for intelligence?;;;
108 | parte4_021|Some evidence comes from the fields of evolutionary psychology and neuroanatomy.|Some evidence comes from the fields of evolutionary psychology and neuroanatomy.;;;
109 | parte4_022|Since the 1960s evolutionary psychologists have discovered a wide range of human universals|Since the nineteen sixties evolutionary psychologists have discovered a wide range of human universals;;;
110 | parte4_023|complex behaviours common to all humans|complex behaviours common to all humans;;;
111 | parte4_024|across cultures and upbringing.|across cultures and upbringing.;;;
112 | parte4_025|These human universals include the incest taboo between mother and son,|These human universals include the incest taboo between mother and son,;;;
113 | parte4_026|the use of music and dance, as well as much complex linguistic structure,|the use of music and dance, as well as much complex linguistic structure,;;;
114 | parte4_027|such as the use of swear words|such as the use of swear words;;;
115 | parte4_028|pronouns, and even structures as basic as the verb.|pronouns, and even structures as basic as the verb.;;;
116 | parte4_029|Complementing these results, a great deal of evidence from neuroanatomy shows that many human behaviours|Complementing these results, a great deal of evidence from neuroanatomy shows that many human behaviours;;;
117 | parte4_030|are controlled by particular localized areas of the brain,|are controlled by particular localized areas of the brain,;;;
118 | parte4_031|and those areas seem to be similar in all people.|and those areas seem to be similar in all people.;;;
119 | parte4_032|Taken together, these findings suggest that many very specialized behaviours|Taken together, these findings suggest that many very specialized behaviours;;;
120 | parte4_033|are hardwired into particular parts of our brains.|are hardwired into particular parts of our brains.;;;
121 | parte4_034|Some people conclude from these results that separate explanations must be required|Some people conclude from these results that separate explanations must be required;;;
122 | parte4_035|for these many brain functions,|for these many brain functions,;;;
123 | parte4_036|and that as a consequence there is an irreducible complexity to the brain's function,|and that as a consequence there is an irreducible complexity to the brain's function,;;;
124 | parte4_037|a complexity that makes a simple explanation for the brain's operation|a complexity that makes a simple explanation for the brain's operation;;;
125 | parte4_038|and, perhaps, a simple algorithm for intelligence|and, perhaps, a simple algorithm for intelligence;;;
126 | parte4_039|For example, one well-known artificial intelligence researcher with this point of view is Marvin Minsky.|For example, one well-known artificial intelligence researcher with this point of view is Marvin Minsky.;;;
127 | parte4_040|In the 1970s and 1980s|In the nineteen seventies and nineteen eighties;;;
128 | parte4_041|Minsky developed his "Society of Mind" theory|Minsky developed his "Society of Mind" theory;;;
129 | parte4_042|based on the idea that human intelligence is the result of a large society of individually simple|based on the idea that human intelligence is the result of a large society of individually simple;;;
130 | parte4_043|but very different, computational processes|but very different, computational processes;;;
131 | parte4_044|which Minsky calls agents.|which Minsky calls agents.;;;
132 | parte4_045|In his book describing the theory, Minsky sums up what he sees as the power of this point of view|In his book describing the theory, Minsky sums up what he sees as the power of this point of view;;;
133 | parte4_046|What magical trick makes us intelligent?|What magical trick makes us intelligent?;;;
134 | parte4_047|The trick is that there is no trick.|The trick is that there is no trick.;;;
135 | parte4_048|The power of intelligence stems from our vast diversity,|The power of intelligence stems from our vast diversity,;;;
136 | parte4_049|not from any single, perfect principle.|not from any single, perfect principle.;;;
137 | parte4_050|In a response to reviews of his book, Minsky elaborated on the motivation for the Society of Mind|In a response to reviews of his book, Minsky elaborated on the motivation for the Society of Mind;;;
138 | parte4_051|giving an argument similar to that stated above,|giving an argument similar to that stated above,;;;
139 | parte4_052|based on neuroanatomy and evolutionary psychology|based on neuroanatomy and evolutionary psychology;;;
140 | parte4_053|We now know that the brain itself is composed of hundreds of different regions and nuclei,|We now know that the brain itself is composed of hundreds of different regions and nuclei,;;;
141 | parte4_054|each with significantly different architectural elements and arrangements,|each with significantly different architectural elements and arrangements,;;;
142 | parte4_055|and that many of them are involved with demonstrably different aspects of our mental activities.|and that many of them are involved with demonstrably different aspects of our mental activities.;;;
143 | parte4_056|This modern mass of knowledge shows that many phenomena|This modern mass of knowledge shows that many phenomena;;;
144 | parte4_057|traditionally described by commonsense terms like "intelligence" or "understanding"|traditionally described by commonsense terms like "intelligence" or "understanding";
145 | parte4_058|actually involve complex assemblies of machinery.|actually involve complex assemblies of machinery.;;;
146 | parte4_059|Minsky is, of course, not the only person to hold a point of view along these lines|Minsky is, of course, not the only person to hold a point of view along these lines;;;
147 | parte4_060|I'm merely giving him as an example of a supporter of this line of argument.|I'm merely giving him as an example of a supporter of this line of argument.;;;
148 | parte4_061|I find the argument interesting, but don't believe the evidence is compelling.|I find the argument interesting, but don't believe the evidence is compelling.;;;
149 | parte4_062|While it's true that the brain is composed of a large number of different regions,|While it's true that the brain is composed of a large number of different regions,;;;
150 | parte4_063|with different functions|with different functions;;;
151 | parte4_064|it does not therefore follow that a simple explanation for the brain's function is impossible.|it does not therefore follow that a simple explanation for the brain's function is impossible.;;;
152 | parte4_065|Perhaps those architectural differences arise out of common underlying principles,|Perhaps those architectural differences arise out of common underlying principles,;;;
153 | parte4_066|much as the motion of comets, the planets, the sun and the stars|much as the motion of comets, the planets, the sun and the stars;;;
154 | parte4_067|all arise from a single gravitational force.|all arise from a single gravitational force.;;;
155 | parte4_068|Neither Minsky nor anyone else|Neither Minsky nor anyone else;;;
156 | parte4_069|has argued convincingly against such underlying principles.|has argued convincingly against such underlying principles.;;;
157 | parte4_070|My own prejudice is in favour of there being a simple algorithm for intelligence.|My own prejudice is in favour of there being a simple algorithm for intelligence.;;;
158 | parte4_071|And the main reason I like the idea, above and beyond the inconclusive arguments above, is that it's an optimistic idea.|And the main reason I like the idea, above and beyond the inconclusive arguments above, is that it's an optimistic idea.;;;
159 | parte4_072|When it comes to research, an unjustified optimism is often more productive than a seemingly better justified pessimism,|When it comes to research, an unjustified optimism is often more productive than a seemingly better justified pessimism,;;;
160 | parte4_073|for an optimist has the courage to set out and try new things.|for an optimist has the courage to set out and try new things.;;;
161 | parte4_074|That's the path to discovery|That's the path to discovery;;;
162 | parte4_075|not what was originally hoped|not what was originally hoped;;;
163 | parte4_076|A pessimist may be more "correct" in some narrow sense|A pessimist may be more "correct" in some narrow sense;;;
164 | parte4_077|but will discover less than the optimist.|but will discover less than the optimist.;;;
165 | parte4_078|This point of view is in stark contrast to the way we usually judge ideas|This point of view is in stark contrast to the way we usually judge ideas;;;
166 | parte4_079|by attempting to figure out whether they are right or wrong.|by attempting to figure out whether they are right or wrong.;;;
167 | parte4_080|But it can be the wrong way of judging a big, bold idea|But it can be the wrong way of judging a big, bold idea;;;
168 | parte4_081|the sort of idea that defines an entire research program|the sort of idea that defines an entire research program;;;
169 | parte4_082|Sometimes, we have only weak evidence about whether such an idea is correct or not.|Sometimes, we have only weak evidence about whether such an idea is correct or not.;;;
170 | parte4_083|We can meekly refuse to follow the idea, instead spending all our time squinting at the available evidence|We can meekly refuse to follow the idea, instead spending all our time squinting at the available evidence;;;
171 | parte4_084|trying to discern what's true.|trying to discern what's true.;;;
172 | parte4_085|Or we can accept that no-one yet knows|Or we can accept that no-one yet knows;;;
173 | parte4_086|and instead work hard on developing the big, bold idea|and instead work hard on developing the big, bold idea;;;
174 | parte4_087|in the understanding that while we have no guarantee of success|in the understanding that while we have no guarantee of success;;;
175 | parte4_088|it is only thus that our understanding advances|it is only thus that our understanding advances;;;
176 | parte4_089|With all that said, in its most optimistic form|With all that said, in its most optimistic form;;;
177 | parte4_090|I don't believe we'll ever find a simple algorithm for intelligence.|I don't believe we'll ever find a simple algorithm for intelligence.;;;
178 | parte4_091|To be more concrete, I don't believe we'll ever find a really short Python|To be more concrete, I don't believe we'll ever find a really short Python;;;
179 | parte4_092|or C or Lisp, or whatever program|or C or Lisp, or whatever program;;;
180 | parte4_093|let's say, anywhere up to a thousand lines of code - which implements artificial intelligence|let's say, anywhere up to a thousand lines of code - which implements artificial intelligence;;;
181 | parte4_094|Nor do I think we'll ever find a really easily-described neural network that can implement artificial intelligence.|Nor do I think we'll ever find a really easily-described neural network that can implement artificial intelligence.;;;
182 | parte4_095|But I do believe it's worth acting as though we could find such a program or network.|But I do believe it's worth acting as though we could find such a program or network.;;;
183 | parte4_096|That's the path to insight, and by pursuing that path we may one day understand enough|That's the path to insight, and by pursuing that path we may one day understand enough;;;
184 | parte4_097|to write a longer program or build a more sophisticated network|to write a longer program or build a more sophisticated network;;;
185 | parte4_098|which those exhibit intelligence.|which those exhibit intelligence.;;;
186 | parte4_099|And so it's worth acting as though an extremely simple algorithm for intelligence exists.|And so it's worth acting as though an extremely simple algorithm for intelligence exists.;;;
187 | parte4_100|In the 1980s, the eminent mathematician and computer scientist Jack Schwartz|In the nineteen eighties, the eminent mathematician and computer scientist Jack Schwartz;;;
188 | parte4_101|was invited to a debate between artificial intelligence proponents and artificial intelligence skeptics.|was invited to a debate between artificial intelligence proponents and artificial intelligence skeptics.;;;
189 | parte4_102|The debate became unruly|The debate became unruly;;;
190 | parte4_103|with the proponents making over-the-top claims about the amazing things just round the corner|with the proponents making over-the-top claims about the amazing things just round the corner;;;
191 | parte4_104|and the skeptics doubling down on their pessimism|and the skeptics doubling down on their pessimism;;;
192 | parte4_105|claiming artificial intelligence was outright impossible.|claiming artificial intelligence was outright impossible.;;;
193 | parte4_106|Schwartz was an outsider to the debate|Schwartz was an outsider to the debate;;;
194 | parte4_107|and remained silent as the discussion heated up|and remained silent as the discussion heated up;;;
195 | parte4_108|During a lull, he was asked to speak up and state his thoughts on the issues under discussion.|During a lull, he was asked to speak up and state his thoughts on the issues under discussion.;;;
196 | parte4_109|He said: "Well, some of these developments may lie one hundred Nobel prizes away"|He said: "Well, some of these developments may lie one hundred Nobel prizes away";;;
197 | parte4_110|It seems to me a perfect response.|It seems to me a perfect response.;;;
198 | parte4_111|The key to artificial intelligence is simple, powerful ideas|The key to artificial intelligence is simple, powerful ideas;;;
199 | parte4_112|and we can and should search optimistically for those ideas.|and we can and should search optimistically for those ideas.;;;
200 | parte4_113|But we're going to need many such ideas,|But we're going to need many such ideas,;;;
201 | parte4_114|and we've still got a long way to go|and we've still got a long way to go;;;
--------------------------------------------------------------------------------