└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # NIPS2016 2 | This project collects the different accepted papers for NIPS 2016 and their link to Arxiv or Gitxiv 3 | 4 | * **Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much** 5 | 6 | Bryan He*, Stanford University; Christopher De Sa, Stanford University; Ioannis Mitliagkas, ; Christopher Ré, Stanford University 7 | 8 | https://arxiv.org/abs/1606.03432 9 | 10 | Abstract: 11 | >Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional >distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality >in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been >conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by >counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor >under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan >using conductance. 12 | 13 | * **Deep ADMM-Net for Compressive Sensing MRI** 14 | 15 | Yan Yang, Xi'an Jiaotong University; Jian Sun*, Xi'an Jiaotong University; Huibin Li, ; Zongben Xu, 16 | 17 | * **A scaled Bregman theorem with applications** 18 | 19 | Richard NOCK, Data61 and ANU; Aditya Menon*, ; Cheng Soon Ong, Data61 20 | 21 | http://arxiv.org/abs/1607.00360 22 | 23 | Abstract : 24 | >Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores >the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled >isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman >distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over >transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, >while admissible data include scalars, vectors and matrices. 25 | >Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the >aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class >density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, >and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate >the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence >properties that have been pervasive in machine learning. 26 | > 27 | 28 | * **Swapout: Learning an ensemble of deep architectures** 29 | 30 | Saurabh Singh*, UIUC; Derek Hoiem, UIUC; David Forsyth, UIUC 31 | 32 | http://arxiv.org/abs/1605.06465 33 | 34 | Abstract: 35 | >We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive >results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual >architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, >similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying >the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than >existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting >architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient >training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer >wider model performs similar to a 1001 layer ResNet model. 36 | 37 | 38 | * **On Regularizing Rademacher Observation Losses** 39 | 40 | Richard NOCK*, Data61 and ANU 41 | 42 | http://users.cecs.anu.edu.au/~rnock/nips2016-n-web.pdf 43 | 44 | Abstract: 45 | >It has recently been shown that supervised learning linear classifiers with two of 46 | >the most popular losses, the logistic and square loss, is equivalent to optimizing an 47 | >equivalent loss over sufficient statistics about the class: Rademacher observations 48 | >(rados). It has also been shown that learning over rados brings solutions to two 49 | >prominent problems for which the state of the art of learning from examples can be 50 | >comparatively inferior and in fact less convenient: (i) protecting and learning from 51 | >private examples, (ii) learning from distributed datasets without entity resolution. 52 | >Bis repetita placent: the two proofs of equivalence are different and rely on specific 53 | >properties of the corresponding losses, so whether these can be unified and generalized 54 | >inevitably comes to mind. This is our first contribution: we show how they can 55 | >be fit into the same theory for the equivalence between example and rado losses. 56 | >As a second contribution, we show that the generalization unveils a surprising 57 | >new connection to regularized learning, and in particular a sufficient condition 58 | >under which regularizing the loss over examples is equivalent to regularizing the 59 | >rados (i.e. the data) in the equivalent rado loss, in such a way that an efficient 60 | >algorithm for one regularized rado loss may be as efficient when changing the 61 | >regularizer. This is our third contribution: we give a formal boosting algorithm 62 | >for the regularized exponential rado-loss which boost with any of the ridge, lasso, 63 | >SLOPE, `1, or elastic net regularizer, using the same master routine for all. Because 64 | >the regularized exponential rado-loss is the equivalent of the regularized logistic 65 | >loss over examples we obtain the first efficient proxy to the minimization of the 66 | >regularized logistic loss over examples using such a wide spectrum of regularizers. 67 | >Experiments display that regularization significantly improves rado-based learning 68 | >and compares favourably with example-based learning. 69 | 70 | * **Without-Replacement Sampling for Stochastic Gradient Methods** 71 | 72 | Ohad Shamir*, Weizmann Institute of Science 73 | 74 | https://arxiv.org/abs/1603.00570 75 | 76 | Abstract: 77 | >Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled >with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often >performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various >scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A >useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in >terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be >as large as the data size (up to logarithmic factors). Our proof techniques combine ideas from stochastic optimization, adversarial >online learning, and transductive learning theory, and can potentially be applied to other stochastic optimization and learning >problems. 78 | 79 | * **Fast and Provably Good Seedings for k-Means** 80 | 81 | Olivier Bachem*, ETH Zurich; Mario Lucic, ETH Zurich; Hamed Hassani, ETH Zurich; Andreas Krause, 82 | 83 | Abstract: 84 | >Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, >k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and >requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently >approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose >a simple yet fast seeding algorithm that produces *provably* good clusterings even *without assumptions* on the data. Our analysis >shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ >seeding by up to several orders of magnitude. We validate our theoretical results in extensive experiments on a variety of >real-world data sets. 85 | 86 | * **Unsupervised Learning for Physical Interaction through Video Prediction** 87 | 88 | Chelsea Finn*, Google, Inc.; Ian Goodfellow, ; Sergey Levine, University of Washington 89 | 90 | http://arxiv.org/abs/1605.07157 91 | 92 | Abstract: 93 | >A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. >Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale >real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To >learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models >pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it >is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for >real-world interactive agents, we also introduce a dataset of 50,000 robot interactions involving pushing motions, including a test >set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning >a "visual imagination" of different futures based on different courses of action. Our experiments show that our proposed method not >only produces more accurate video predictions, but also more accurately predicts object motion, when compared to prior methods. 94 | 95 | * **Matrix Completion and Clustering in Self-Expressive Models** 96 | 97 | Ehsan Elhamifar*, 98 | 99 | * **Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling** 100 | 101 | Chengkai Zhang, ; Jiajun Wu*, MIT; Tianfan Xue, ; William Freeman, ; Joshua Tenenbaum, 102 | 103 | * **Probabilistic Modeling of Future Frames from a Single Image** 104 | 105 | Tianfan Xue*, ; Jiajun Wu, MIT; Katherine Bouman, MIT; William Freeman, 106 | 107 | * **Human Decision-Making under Limited Time** 108 | 109 | Pedro Ortega*, ; Alan Stocker, 110 | 111 | * **Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition** 112 | 113 | Shizhong Han*, University of South Carolina; Zibo Meng, University of South Carolina; Ahmed Shehab Khan, University of South Carolina; Yan Tong, University of South Carolina 114 | 115 | https://cse.sc.edu/~mengz/papers/NIPS2016.pdf 116 | 117 | Abstract: 118 | >Recognizing facial action units (AUs) from spontaneous facial expressions is still 119 | >a challenging problem. Most recently, CNNs have shown promise on facial AU 120 | >recognition. However, the learned CNNs are often overfitted and do not gener- 121 | >alize well to unseen subject due to limited AU-coded training images. We pro- 122 | >posed a novel Incremental Boosting CNN (IB-CNN) to integrate boosting into the 123 | >CNN via an incremental boosting layer that selects discriminative neurons from 124 | >the lower layer and is incrementally updated on successive mini-batches. In ad- 125 | >dition, a novel loss function that accounts for errors from both the incremental 126 | >boosted classifier and individual weak classifiers was proposed to fine-tune the IB- 127 | >CNN. Experimental results on two benchmark AU databases have demonstrated 128 | >that the IB-CNN yields significant improvement over the traditional CNN and the 129 | >one without incremental learning, as well as outperforming the state-of-the-art 130 | >CNN-based methods in AU recognition. The improvement is more impressive for 131 | >the AUs that have the lowest frequencies in the databases 132 | 133 | * **Natural-Parameter Networks: A Class of Probabilistic Neural Networks** 134 | 135 | Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung, 136 | 137 | * **Tree-Structured Reinforcement Learning for Sequential Object Localization** 138 | 139 | Zequn Jie*, National Univ of Singapore; Xiaodan Liang, Sun Yat-sen University; Jiashi Feng, National University of Singapo; Xiaojie Jin, NUS; Wen Feng Lu, National Univ of Singapore; Shuicheng Yan, 140 | 141 | * **Unsupervised Domain Adaptation with Residual Transfer Networks** 142 | 143 | Mingsheng Long*, Tsinghua University; Han Zhu, Tsinghua University; Jianmin Wang, Tsinghua University; Michael Jordan, 144 | 145 | http://arxiv.org/abs/1602.04433 146 | 147 | Abstract: 148 | >The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is >unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to >domain adaptation in deep networks that can simultaneously learn adaptive classifiers and transferable features from labeled data in >the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and >assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging >several layers into the deep network to explicitly learn the residual function with reference to the target classifier. We embed >features of multiple layers into reproducing kernel Hilbert spaces (RKHSs) and match feature distributions for feature adaptation. >The adaptation behaviors can be achieved in most feed-forward models by extending them with new residual layers and loss functions, >which can be trained efficiently using standard back-propagation. Empirical evidence exhibits that the approach outperforms state of >art methods on standard domain adaptation datasets. 149 | 150 | * **Verification Based Solution for Structured MAB Problems** 151 | 152 | Zohar Karnin*, 153 | 154 | * **Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games** 155 | 156 | Maximilian Balandat*, UC Berkeley; Walid Krichene, UC Berkeley; Claire Tomlin, UC Berkeley; Alexandre Bayen, UC Berkeley 157 | 158 | * **Linear dynamical neural population models through nonlinear embeddings** 159 | 160 | Yuanjun Gao, Columbia University; Evan Archer*, ; John Cunningham, ; Liam Paninski, 161 | 162 | https://arxiv.org/abs/1605.08454 163 | 164 | Abstract: 165 | >A body of recent work in modeling neural activity focuses on recovering low-dimensional latent features that capture the statistical >structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is >computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of >each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to >capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent >space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior >capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of >generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, >fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive >performance and interpretability. 166 | 167 | * **SURGE: Surface Regularized Geometry Estimation from a Single Image** 168 | 169 | Peng Wang*, UCLA; Xiaohui Shen, Adobe Research; Bryan Russell, ; Scott Cohen, Adobe Research; Brian Price, ; Alan Yuille, 170 | 171 | * **Interpretable Distribution Features with Maximum Testing Power** 172 | 173 | Wittawat Jitkrittum*, Gatsby Unit, UCL; Zoltan Szabo, ; Kacper Chwialkowski, Gatsby Unit, UCL; Arthur Gretton, 174 | 175 | https://arxiv.org/abs/1605.06796 176 | 177 | Abstract: 178 | >Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions >evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the >distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious >and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion >converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional >text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art >quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results. 179 | 180 | * **Sorting out typicality with the inverse moment matrix SOS polynomial** 181 | 182 | Edouard Pauwels*, ; Jean-Bernard Lasserre, LAAS-CNRS 183 | 184 | http://arxiv.org/abs/1606.03858 185 | 186 | Abstract: 187 | >We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the >previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain >distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) >derived in a simple manner from the inverse of the empirical moment matrix. In fact, this SOS polynomial is directly related to >orthogonal polynomials and the Christoffel function. This allows to generalize and interpret extremality properties of orthogonal >polynomials and to provide a mathematical rationale for the observed phenomenon. Among diverse potential applications, we illustrate >the relevance of our results on a network intrusion detection task for which we obtain performances similar to existing dedicated >methods reported in the literature. 188 | 189 | * **Multi-armed Bandits: Competing with Optimal Sequences** 190 | 191 | Zohar Karnin*, ; Oren Anava, Technion 192 | 193 | * **Multivariate tests of association based on univariate tests** 194 | 195 | Ruth Heller*, Tel-Aviv University; Yair Heller, 196 | 197 | http://arxiv.org/abs/1603.03418 198 | 199 | Abstract: 200 | >For testing two random vectors for independence, we consider testing whether the distance of one vector from a center point is >independent from the distance of the other vector from a center point by a univariate test. In this paper we provide conditions >under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect >dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If >the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center >points and aggregate the center-specific univariate tests, the power may be further improved, and the resulting multivariate test >may be distribution-free for specific aggregation methods (if the univariate test is distribution-free). We show that several >multivariate tests recently proposed in the literature can be viewed as instances of this general approach. 201 | 202 | * **Learning What and Where to Draw** 203 | 204 | Scott Reed*, University of Michigan; Zeynep Akata, Max Planck Institute for Informatics; Santosh Mohan, University of MIchigan; Samuel Tenka, University of MIchigan; Bernt Schiele, ; Honglak Lee, University of Michigan 205 | 206 | * **The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM** 207 | 208 | Damek Davis*, Cornell University; Brent Edmunds, University of California, Los Angeles; Madeleine Udell, 209 | 210 | https://arxiv.org/abs/1606.02338 211 | 212 | Abstract: 213 | >We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic >proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel >optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best >known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the >number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and >show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix >factorization problems. 214 | 215 | * **Integrator Nets** 216 | 217 | Hakan Bilen*, University of Oxford; Andrea Vedaldi, 218 | 219 | * **Combining Low-Density Separators with CNNs** 220 | 221 | Yu-Xiong Wang*, Carnegie Mellon University; Martial Hebert, Carnegie Mellon University 222 | 223 | * **CNNpack: Packing Convolutional Neural Networks in the Frequency Domain** 224 | 225 | Yunhe Wang*, Peking University ; Shan You, ; Dacheng Tao, ; Chao Xu, ; Chang Xu, 226 | 227 | * **Cooperative Graphical Models** 228 | 229 | Josip Djolonga*, ETH Zurich; Stefanie Jegelka, MIT; Sebastian Tschiatschek, ETH Zurich; Andreas Krause, 230 | 231 | * **f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization** 232 | 233 | Sebastian Nowozin*, Microsoft Research; Botond Cseke, Microsoft Research; Ryota Tomioka, MSRC 234 | 235 | https://arxiv.org/abs/1606.00709 236 | 237 | Abstract: 238 | >Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random >input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and >allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The >generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We >show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation >approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various >choices of divergence functions on training complexity and the quality of the obtained generative models. 239 | 240 | * **Bayesian Optimization for Probabilistic Programs** 241 | 242 | Tom Rainforth*, University of Oxford; Tuan Anh Le, University of Oxford; Jan-Willem van de Meent, University of Oxford; Michael Osborne, ; Frank Wood, 243 | 244 | * **Hierarchical Question-Image Co-Attention for Visual Question Answering** 245 | 246 | Jiasen Lu*, Virginia Tech; Jianwei Yang, Virginia Tech; Dhruv Batra, ; Devi Parikh, Virginia Tech 247 | 248 | https://arxiv.org/abs/1606.00061 249 | 250 | Abstract: 251 | >A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting >image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual >attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model >for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question and consequently >the image via the co-attention mechanism in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN) >model. Our final model outperforms all reported methods, improving the state-of-the-art on the VQA dataset from 60.4% to 62.1%, and >from 61.6% to 65.4% on the COCO-QA dataset. 252 | 253 | * **Optimal Sparse Linear Encoders and Sparse PCA** 254 | 255 | Malik Magdon-Ismail*, Rensselaer; Christos Boutsidis, 256 | 257 | * **FPNN: Field Probing Neural Networks for 3D Data** 258 | 259 | Yangyan Li*, Stanford University; Soeren Pirk, Stanford University; Hao Su, Stanford University; Charles Qi, Stanford University; Leonidas Guibas, Stanford University 260 | 261 | https://arxiv.org/abs/1605.06240 262 | 263 | Abstract: 264 | >Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. >Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting >convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of >3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, >occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, >we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract >features from them. Each field probing filter is a set of probing points --- sensors that perceive the space. Our learning algorithm >optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing >filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space "intelligently", rather than >operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing >state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets. 265 | 266 | * **CRF-CNN: Modeling Structured Information in Human Pose Estimation** 267 | 268 | Xiao Chu*, Cuhk; Wanli Ouyang, ; hongsheng Li, cuhk; Xiaogang Wang, Chinese University of Hong Kong 269 | 270 | * **Fairness in Learning: Classic and Contextual Bandits** 271 | 272 | Matthew Joseph, University of Pennsylvania; Michael Kearns, ; Jamie Morgenstern*, University of Pennsylvania; Aaron Roth, 273 | 274 | https://arxiv.org/abs/1605.07139 275 | 276 | Abstract: 277 | >We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that >given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite >a learning algorithm's uncertainty over the true payoffs. We prove results of two types. 278 | >First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a >provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on >the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for >standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the >general contextual case. 279 | >In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a >KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair >contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably >fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different >class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms 280 | 281 | Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization 282 | Alexander Kirillov*, TU Dresden; Alexander Shekhovtsov, ; Carsten Rother, ; Bogdan Savchynskyy, 283 | 284 | Domain Separation Networks 285 | Dilip Krishnan, Google; George Trigeorgis, Google; Konstantinos Bousmalis*, ; Nathan Silberman, Google; Dumitru Erhan, Google 286 | 287 | DISCO Nets : DISsimilarity COefficients Networks 288 | Diane Bouchacourt*, University of Oxford; M. Pawan Kumar, University of Oxford; Sebastian Nowozin, 289 | 290 | Multimodal Residual Learning for Visual QA 291 | Jin-Hwa Kim*, Seoul National University; Sang-Woo Lee, Seoul National University; Dong-Hyun Kwak, Seoul National University; Min-Oh Heo, Seoul National University; Jeonghee Kim, Naver Labs; Jung-Woo Ha, Naver Labs; Byoung-Tak Zhang, Seoul National University 292 | 293 | CMA-ES with Optimal Covariance Update and Storage Complexity 294 | Dídac Rodríguez Arbonès, University of Copenhagen; Oswin Krause, ; Christian Igel*, 295 | 296 | R-FCN: Object Detection via Region-based Fully Convolutional Networks 297 | Jifeng Dai, Microsoft; Yi Li, Tsinghua University; Kaiming He*, Microsoft; Jian Sun, Microsoft 298 | 299 | GAP Safe Screening Rules for Sparse-Group Lasso 300 | Eugene Ndiaye, Télécom ParisTech; Olivier Fercoq, ; Alexandre Gramfort, ; Joseph Salmon*, 301 | 302 | Learning and Forecasting Opinion Dynamics in Social Networks 303 | Abir De, IIT Kharagpur; Isabel Valera, ; Niloy Ganguly, IIT Kharagpur; sourangshu Bhattacharya, IIT Kharagpur; Manuel Gomez Rodriguez*, MPI-SWS 304 | 305 | Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares 306 | Rong Zhu*, Chinese Academy of Sciences 307 | 308 | Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks 309 | Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung, 310 | 311 | Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula 312 | Jean Barbier, EPFL; mohamad Dia, EPFL; Florent Krzakala*, ; Thibault Lesieur, IPHT Saclay; Nicolas Macris, EPFL; Lenka Zdeborova, 313 | 314 | A Unified Approach for Learning the Parameters of Sum-Product Networks 315 | Han Zhao*, Carnegie Mellon University; Pascal Poupart, ; Geoff Gordon, 316 | 317 | Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images 318 | Junhua Mao*, UCLA; Jiajing Xu, ; Kevin Jing, ; Alan Yuille, 319 | 320 | Stochastic Online AUC Maximization 321 | Yiming Ying*, ; Longyin Wen, State University of New York at Albany; Siwei Lyu, State University of New York at Albany 322 | 323 | The Generalized Reparameterization Gradient 324 | Francisco Ruiz*, Columbia University; Michalis K. Titsias, ; David Blei, 325 | 326 | Coupled Generative Adversarial Networks 327 | Ming-Yu Liu*, MERL; Oncel Tuzel, Mitsubishi Electric Research Labs (MERL) 328 | 329 | Exponential Family Embeddings 330 | Maja Rudolph*, Columbia University; Francisco J. R. Ruiz, ; Stephan Mandt, Disney Research; David Blei, 331 | 332 | Variational Information Maximization for Feature Selection 333 | Shuyang Gao*, ; Greg Ver Steeg, ; Aram Galstyan, 334 | 335 | Operator Variational Inference 336 | Rajesh Ranganath*, Princeton University; Dustin Tran, Columbia University; Jaan Altosaar, Princeton University; David Blei, 337 | 338 | Fast learning rates with heavy-tailed losses 339 | Vu Dinh*, Fred Hutchinson Cancer Center; Lam Ho, UCLA; Binh Nguyen, University of Science, Vietnam; Duy Nguyen, University of Wisconsin-Madison 340 | 341 | Budgeted stream-based active learning via adaptive submodular maximization 342 | Kaito Fujii*, Kyoto University; Hisashi Kashima, Kyoto University 343 | 344 | Learning feed-forward one-shot learners 345 | Luca Bertinetto, University of Oxford; Joao Henriques, University of Oxford; Jack Valmadre*, University of Oxford; Philip Torr, ; Andrea Vedaldi, 346 | 347 | Learning User Perceived Clusters with Feature-Level Supervision 348 | Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University 349 | 350 | Robust Spectral Detection of Global Structures in the Data by Learning a Regularization 351 | Pan Zhang*, ITP, CAS 352 | 353 | Residual Networks are Exponential Ensembles of Relatively Shallow Networks 354 | Andreas Veit*, Cornell University; Michael Wilber, ; Serge Belongie, Cornell University 355 | 356 | Adversarial Multiclass Classification: A Risk Minimization Perspective 357 | Rizal Fathony*, U. of Illinois at Chicago; Anqi Liu, ; Kaiser Asif, ; Brian Ziebart, 358 | 359 | Solving Random Systems of Quadratic Equations via Truncated Generalized Gradient Flow 360 | Gang Wang*, University of Minnesota; Georgios Giannakis, University of Minnesota 361 | 362 | Coin Betting and Parameter-Free Online Learning 363 | Francesco Orabona*, Yahoo Research; David Pal, 364 | 365 | Deep Learning without Poor Local Minima 366 | Kenji Kawaguchi*, MIT 367 | 368 | Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity 369 | Eugene Belilovsky*, CentraleSupelec; Gael Varoquaux, ; Matthew Blaschko, KU Leuven 370 | 371 | A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++ 372 | Dennis Wei*, IBM Research 373 | 374 | Generating Videos with Scene Dynamics 375 | Carl Vondrick*, MIT; Hamed Pirsiavash, ; Antonio Torralba, 376 | 377 | Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs 378 | Daniel Ritchie*, Stanford University; Anna Thomas, Stanford University; Pat Hanrahan, Stanford University; Noah Goodman, 379 | 380 | A Powerful Generative Model Using Random Weights for the Deep Image Representation 381 | Kun He, Huazhong University of Science and Technology; Yan Wang*, HUAZHONG UNIVERSITY OF SCIENCE; John Hopcroft, Cornell University 382 | 383 | Optimizing affinity-based binary hashing using auxiliary coordinates 384 | Ramin Raziperchikolaei, UC Merced; Miguel Carreira-Perpinan*, UC Merced 385 | 386 | Double Thompson Sampling for Dueling Bandits 387 | Huasen Wu*, University of California at Davis; Xin Liu, University of California, Davis 388 | 389 | Generating Images with Perceptual Similarity Metrics based on Deep Networks 390 | Alexey Dosovitskiy*, ; Thomas Brox, University of Freiburg 391 | 392 | Dynamic Filter Networks 393 | Xu Jia*, KU Leuven; Bert De Brabandere, ; Tinne Tuytelaars, KU Leuven; Luc Van Gool, ETH Zürich 394 | 395 | A Simple Practical Accelerated Method for Finite Sums 396 | Aaron Defazio*, Ambiata 397 | 398 | Barzilai-Borwein Step Size for Stochastic Gradient Descent 399 | Conghui Tan*, The Chinese University of HK; Shiqian Ma, ; Yu-Hong Dai, ; Yuqiu Qian, The University of Hong Kong 400 | 401 | On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability 402 | Guillaume Papa, Télécom ParisTech; Aurélien Bellet*, ; Stephan Clémencon, 403 | 404 | Optimal spectral transportation with application to music transcription 405 | Rémi Flamary, ; Cédric Févotte*, CNRS; Nicolas Courty, ; Valentin Emiya, Aix-Marseille University 406 | 407 | Regularized Nonlinear Acceleration 408 | Damien Scieur*, INRIA - ENS; Alexandre D'Aspremont, ; Francis Bach, 409 | 410 | SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling 411 | Dehua Cheng*, Univ. of Southern California; Richard Peng, ; Yan Liu, ; Ioakeim Perros, Georgia Institute of Technology 412 | 413 | Single-Image Depth Perception in the Wild 414 | Weifeng Chen*, University of Michigan; Zhao Fu, University of Michigan; Dawei Yang, University of Michigan; Jia Deng, 415 | 416 | Computational and Statistical Tradeoffs in Learning to Rank 417 | Ashish Khetan*, University of Illinois Urbana-; Sewoong Oh, 418 | 419 | Learning to Poke by Poking: Experiential Learning of Intuitive Physics 420 | Pulkit Agrawal*, UC Berkeley; Ashvin Nair, UC Berkeley; Pieter Abbeel, ; Jitendra Malik, ; Sergey Levine, University of Washington 421 | 422 | Online Convex Optimization with Unconstrained Domains and Losses 423 | Ashok Cutkosky*, Stanford University; Kwabena Boahen, Stanford University 424 | 425 | An ensemble diversity approach to supervised binary hashing 426 | Miguel Carreira-Perpinan*, UC Merced; Ramin Raziperchikolaei, UC Merced 427 | 428 | Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis 429 | Weiran Wang*, ; Jialei Wang, University of Chicago; Dan Garber, ; Nathan Srebro, 430 | 431 | The Power of Adaptivity in Identifying Statistical Alternatives 432 | Kevin Jamieson*, UC Berkeley; Daniel Haas, ; Ben Recht, 433 | 434 | On Explore-Then-Commit strategies 435 | Aurelien Garivier, ; Tor Lattimore, ; Emilie Kaufmann*, 436 | 437 | Sublinear Time Orthogonal Tensor Decomposition 438 | Zhao Song*, UT-Austin; David Woodruff, ; Huan Zhang, UC-Davis 439 | 440 | DECOrrelated feature space partitioning for distributed sparse regression 441 | Xiangyu Wang*, Duke University; David Dunson, Duke University; Chenlei Leng, University of Warwick 442 | 443 | Deep Alternative Neural Networks: Exploring Contexts as Early as Possible for Action Recognition 444 | Jinzhuo Wang*, PKU; Wenmin Wang, peking university; xiongtao Chen, peking university; Ronggang Wang, peking university; Wen Gao, peking university 445 | 446 | Machine Translation Through Learning From a Communication Game 447 | Di He*, Microsoft; Yingce Xia, USTC; Tao Qin, Microsoft; Liwei Wang, ; Nenghai Yu, USTC; Tie-Yan Liu, Microsoft; wei-Ying Ma, Microsoft 448 | 449 | Dialog-based Language Learning 450 | Jason Weston*, 451 | 452 | Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition 453 | Theodore Bluche*, A2iA 454 | 455 | Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction 456 | Hsiang-Fu Yu*, University of Texas at Austin; Nikhil Rao, ; Inderjit Dhillon, 457 | 458 | Active Nearest-Neighbor Learning in Metric Spaces 459 | Aryeh Kontorovich, ; Sivan Sabato*, Ben-Gurion University of the Negev; Ruth Urner, MPI Tuebingen 460 | 461 | Proximal Deep Structured Models 462 | Shenlong Wang*, University of Toronto; Sanja Fidler, ; Raquel Urtasun, 463 | 464 | Faster Projection-free Convex Optimization over the Spectrahedron 465 | Dan Garber*, 466 | 467 | Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach 468 | Remi Lam*, MIT; Karen Willcox, MIT; David Wolpert, 469 | 470 | Learning Sound Representations from Unlabeled Video 471 | Yusuf Aytar, MIT; Carl Vondrick*, MIT; Antonio Torralba, 472 | 473 | Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks 474 | Tim Salimans*, ; Diederik Kingma, 475 | 476 | Efficient Second Order Online Learning by Sketching 477 | Haipeng Luo*, Princeton University; Alekh Agarwal, Microsoft; Nicolò Cesa-Bianchi, ; John Langford, 478 | 479 | Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis 480 | Yoshinobu Kawahara*, Osaka University 481 | 482 | Distributed Flexible Nonlinear Tensor Factorization 483 | Shandian Zhe*, Purdue University; Kai Zhang, Lawrence Berkeley Lab; Pengyuan Wang, Yahoo! Research; Kuang-chih Lee, ; Zenglin Xu, ; Alan Qi, ; Zoubin Ghahramani, 484 | 485 | The Robustness of Estimator Composition 486 | Pingfan Tang*, University of Utah; Jeff Phillips, University of Utah 487 | 488 | Efficient and Robust Spiking Neural Circuit for Navigation Inspired by Echolocating Bats 489 | Bipin Rajendran*, NJIT; Pulkit Tandon, IIT Bombay; Yash Malviya, IIT Bombay 490 | 491 | PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions 492 | Michael Figurnov*, Skolkovo Inst. of Sc and Tech; Aijan Ibraimova, Skolkovo Institute of Science and Technology; Dmitry P. Vetrov, ; Pushmeet Kohli, 493 | 494 | Differential Privacy without Sensitivity 495 | Kentaro Minami*, The University of Tokyo; HItomi Arai, The University of Tokyo; Issei Sato, The University of Tokyo; Hiroshi Nakagawa, 496 | 497 | Optimal Cluster Recovery in the Labeled Stochastic Block Model 498 | Se-Young Yun*, Los Alamos National Laboratory; Alexandre Proutiere, 499 | 500 | Even Faster SVD Decomposition Yet Without Agonizing Pain 501 | Zeyuan Allen-Zhu*, Princeton University; Yuanzhi Li, Princeton University 502 | 503 | An algorithm for L1 nearest neighbor search via monotonic embedding 504 | Xinan Wang*, UCSD; Sanjoy Dasgupta, 505 | 506 | Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations 507 | Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Junier Oliva, ; Jeff Schneider, CMU; Barnabas Poczos, 508 | 509 | Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes 510 | Dan Garber*, ; Ofer Meshi, 511 | 512 | Efficient Nonparametric Smoothness Estimation 513 | Shashank Singh*, Carnegie Mellon University; Simon Du, Carnegie Mellon University; Barnabas Poczos, 514 | 515 | A Theoretically Grounded Application of Dropout in Recurrent Neural Networks 516 | Yarin Gal*, University of Cambridge; Zoubin Ghahramani, 517 | 518 | Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation 519 | George Papamakarios*, University of Edinburgh; Iain Murray, University of Edinburgh 520 | 521 | Direct Feedback Alignment Provides Learning In Deep Neural Networks 522 | Arild Nøkland*, None 523 | 524 | Safe and Efficient Off-Policy Reinforcement Learning 525 | Remi Munos, Google DeepMind; Thomas Stepleton, Google DeepMind; Anna Harutyunyan, Vrije Universiteit Brussel; Marc Bellemare*, Google DeepMind 526 | 527 | A Multi-Batch L-BFGS Method for Machine Learning 528 | Albert Berahas*, Northwestern University; Jorge Nocedal, Northwestern University; Martin Takac, Lehigh University 529 | 530 | Semiparametric Differential Graph Models 531 | Pan Xu*, University of Virginia; Quanquan Gu, University of Virginia 532 | 533 | Rényi Divergence Variational Inference 534 | Yingzhen Li*, University of Cambridge; Richard E. Turner, 535 | 536 | Doubly Convolutional Neural Networks 537 | Shuangfei Zhai*, Binghamton University; Yu Cheng, IBM Research; Zhongfei Zhang, Binghamton University 538 | 539 | Density Estimation via Discrepancy Based Adaptive Sequential Partition 540 | Dangna Li*, Stanford university; Kun Yang, Google Inc; Wing Wong, Stanford university 541 | 542 | How Deep is the Feature Analysis underlying Rapid Visual Categorization? 543 | Sven Eberhardt*, Brown University; Jonah Cader, Brown University; Thomas Serre, 544 | 545 | Variational Information Maximizing Exploration 546 | Rein Houthooft*, Ghent University - iMinds; UC Berkeley; OpenAI; Xi Chen, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; John Schulman, OpenAI; Filip De Turck, Ghent University - iMinds; Pieter Abbeel, 547 | 548 | Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain 549 | Timothy Rubin*, Indiana University; Sanmi Koyejo, UIUC; Michael Jones, Indiana University; Tal Yarkoni, University of Texas at Austin 550 | 551 | Solving Marginal MAP Problems with NP Oracles and Parity Constraints 552 | Yexiang Xue*, Cornell University; Zhiyuan Li, Tsinghua University; Stefano Ermon, ; Carla Gomes, Cornell University; Bart Selman, 553 | 554 | Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models 555 | Tomoharu Iwata*, ; Makoto Yamada, 556 | 557 | Fast Stochastic Methods for Nonsmooth Nonconvex Optimization 558 | Sashank Jakkam Reddi*, Carnegie Mellon University; Suvrit Sra, MIT; Barnabas Poczos, ; Alexander J. Smola, 559 | 560 | Variance Reduction in Stochastic Gradient Langevin Dynamics 561 | Kumar Dubey*, Carnegie Mellon University; Sashank Jakkam Reddi, Carnegie Mellon University; Sinead Williamson, ; Barnabas Poczos, ; Alexander J. Smola, ; Eric Xing, Carnegie Mellon University 562 | 563 | Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning 564 | Mehdi Sajjadi*, University of Utah; Mehran Javanmardi, University of Utah; Tolga Tasdizen, University of Utah 565 | 566 | Dense Associative Memory for Pattern Recognition 567 | Dmitry Krotov*, Institute for Advanced Study; John Hopfield, Princeton Neuroscience Institute 568 | 569 | Causal Bandits: Learning Good Interventions via Causal Inference 570 | Finnian Lattimore, Australian National University; Tor Lattimore*, ; Mark Reid, 571 | 572 | Refined Lower Bounds for Adversarial Bandits 573 | Sébastien Gerchinovitz, ; Tor Lattimore*, 574 | 575 | Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning 576 | Gang Niu*, University of Tokyo; Marthinus du Plessis, ; Tomoya Sakai, ; Yao Ma, ; Masashi Sugiyama, RIKEN / University of Tokyo 577 | 578 | Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ϵ) 579 | Yi Xu*, The University of Iowa; Yan Yan, University of Technology Sydney; Qihang Lin, ; Tianbao Yang, University of Iowa 580 | 581 | Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functionals Estimators 582 | Shashank Singh*, Carnegie Mellon University; Barnabas Poczos, 583 | 584 | A state-space model of cross-region dynamic connectivity in MEG/EEG 585 | Ying Yang*, Carnegie Mellon University; Elissa Aminoff, Carnegie Mellon University; Michael Tarr, Carnegie Mellon University; Robert Kass, Carnegie Mellon University 586 | 587 | What Makes Objects Similar: A Unified Multi-Metric Learning Approach 588 | Han-Jia Ye, ; De-Chuan Zhan*, ; Xue-Min Si, Nanjing University; Yuan Jiang, Nanjing University; Zhi-Hua Zhou, 589 | 590 | Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint 591 | Nguyen Viet Cuong*, National University of Singapore; Huan Xu, NUS 592 | 593 | Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions 594 | Siddartha Ramamohan, Indian Institute of Science; Arun Rajkumar, ; Shivani Agarwal*, Radcliffe Institute, Harvard 595 | 596 | Local Similarity-Aware Deep Feature Embedding 597 | Chen Huang*, Chinese University of HongKong; Chen Change Loy, The Chinese University of HK; Xiaoou Tang, The Chinese University of Hong Kong 598 | 599 | A Communication-Efficient Parallel Algorithm for Decision Tree 600 | Qi Meng*, Peking University; Guolin Ke, Microsoft Research; Taifeng Wang, Microsoft Research; Wei Chen, Microsoft Research; Qiwei Ye, Microsoft Research; Zhi-Ming Ma, Academy of Mathematics and Systems Science, Chinese Academy of Sciences; Tie-Yan Liu, Microsoft Research 601 | 602 | Convex Two-Layer Modeling with Latent Structure 603 | Vignesh Ganapathiraman, University Of Illinois at Chicago; Xinhua Zhang*, UIC; Yaoliang Yu, ; Junfeng Wen, UofA 604 | 605 | Sampling for Bayesian Program Learning 606 | Kevin Ellis*, MIT; Armando Solar-Lezama, MIT; Joshua Tenenbaum, 607 | 608 | Learning Kernels with Random Features 609 | Aman Sinha*, Stanford University; John Duchi, 610 | 611 | Optimal Tagging with Markov Chain Optimization 612 | Nir Rosenfeld*, Hebrew University of Jerusalem; Amir Globerson, Tel Aviv University 613 | 614 | Crowdsourced Clustering: Querying Edges vs Triangles 615 | Ramya Korlakai Vinayak*, Caltech; Hassibi Babak, Caltech 616 | 617 | Mixed vine copulas as joint models of spike counts and local field potentials 618 | Arno Onken*, IIT; Stefano Panzeri, IIT 619 | 620 | Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation 621 | Emmanuel Abbe*, ; Colin Sandon, 622 | 623 | Adaptive Concentration Inequalities for Sequential Decision Problems 624 | Shengjia Zhao*, Tsinghua University; Enze Zhou, Tsinghua University; Ashish Sabharwal, Allen Institute for AI; Stefano Ermon, 625 | Fast mini-batch k-means by nesting 626 | James Newling*, Idiap Research Institute; Francois Fleuret, Idiap Research Institute 627 | 628 | Deep Learning Models of the Retinal Response to Natural Scenes 629 | Lane McIntosh*, Stanford University; Niru Maheswaranathan, Stanford University; Aran Nayebi, Stanford University; Surya Ganguli, Stanford; Stephen Baccus, Stanford University 630 | 631 | Preference Completion from Partial Rankings 632 | Suriya Gunasekar*, UT Austin; Sanmi Koyejo, UIUC; Joydeep Ghosh, UT Austin 633 | 634 | Dynamic Network Surgery for Efficient DNNs 635 | Yiwen Guo*, Intel Labs China; Anbang Yao, ; Yurong Chen, 636 | 637 | Learning a Metric Embedding for Face Recognition using the Multibatch Method 638 | Oren Tadmor, OrCam; Tal Rosenwein, Orcam; Shai Shalev-Shwartz, OrCam; Yonatan Wexler*, OrCam; Amnon Shashua, OrCam 639 | 640 | A Pseudo-Bayesian Algorithm for Robust PCA 641 | Tae-Hyun Oh*, KAIST; David Wipf, ; Yasuyuki Matsushita, Osaka University; In So Kweon, KAIST 642 | 643 | End-to-End Kernel Learning with Supervised Convolutional Kernel Networks 644 | Julien Mairal*, Inria 645 | 646 | Stochastic Variance Reduction Methods for Saddle-Point Problems 647 | P. Balamurugan, ; Francis Bach*, 648 | 649 | Flexible Models for Microclustering with Applications to Entity Resolution 650 | Brenda Betancourt, Duke University; Giacomo Zanella, The University of Warick; Jeffrey Miller, Duke University; Hanna Wallach, Microsoft Research New England; Abbas Zaidi, Duke University; Rebecca C. Steorts*, Duke University 651 | 652 | Catching heuristics are optimal control policies 653 | Boris Belousov*, TU Darmstadt; Gerhard Neumann, ; Constantin Rothkopf, ; Jan Peters, 654 | 655 | Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian 656 | Victor Picheny, Institut National de la Recherche Agronomique; Robert Gramacy*, ; Stefan Wild, Argonne National Lab; Sebastien Le Digabel, École Polytechnique de Montréal 657 | 658 | Adaptive Neural Compilation 659 | Rudy Bunel*, Oxford University; Alban Desmaison, Oxford; M. Pawan Kumar, University of Oxford; Pushmeet Kohli, ; Philip Torr, 660 | 661 | Synthesis of MCMC and Belief Propagation 662 | Sung-Soo Ahn*, KAIST; Misha Chertkov, Los Alamos National Laboratory; Jinwoo Shin, KAIST 663 | 664 | Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables 665 | Mauro Scanagatta*, Idsia; Giorgio Corani, Idsia; Cassio Polpo de Campos, Queen's University Belfast; Marco Zaffalon, IDSIA 666 | 667 | Unifying Count-Based Exploration and Intrinsic Motivation 668 | Marc Bellemare*, Google DeepMind; Srinivasan Sriram, ; Georg Ostrovski, Google DeepMind; Tom Schaul, ; David Saxton, Google DeepMind; Remi Munos, Google DeepMind 669 | 670 | Large Margin Discriminant Dimensionality Reduction in Prediction Space 671 | Mohammad Saberian*, Netflix; Jose Costa Pereira, UC San Diego; Nuno Nvasconcelos, UC San Diego 672 | 673 | Stochastic Structured Prediction under Bandit Feedback 674 | Artem Sokolov, Heidelberg University; Julia Kreutzer, Heidelberg University; Stefan Riezler*, Heidelberg University 675 | 676 | Simple and Efficient Weighted Minwise Hashing 677 | Anshumali Shrivastava*, Rice University 678 | 679 | Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation 680 | Ilija Bogunovic*, EPFL Lausanne; Jonathan Scarlett, ; Andreas Krause, ; Volkan Cevher, 681 | 682 | Structured Sparse Regression via Greedy Hard Thresholding 683 | Prateek Jain, Microsoft Research; Nikhil Rao*, ; Inderjit Dhillon, 684 | 685 | Understanding Probabilistic Sparse Gaussian Process Approximations 686 | Matthias Bauer*, University of Cambridge; Mark van der Wilk, University of Cambridge; Carl Rasmussen, University of Cambridge 687 | 688 | SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques 689 | Elad Richardson*, Technion; Rom Herskovitz, ; Boris Ginsburg, ; Michael Zibulevsky, 690 | 691 | Long-Term Trajectory Planning Using Hierarchical Memory Networks 692 | Stephan Zheng*, Caltech; Yisong Yue, ; Patrick Lucey, Stats 693 | 694 | Learning Tree Structured Potential Games 695 | Vikas Garg*, MIT; Tommi Jaakkola, 696 | 697 | Observational-Interventional Priors for Dose-Response Learning 698 | Ricardo Silva*, 699 | 700 | Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs 701 | Shahin Jabbari*, University of Pennsylvania; Ryan Rogers, University of Pennsylvania; Aaron Roth, ; Steven Wu, University of Pennsylvania 702 | 703 | Identification and Overidentification of Linear Structural Equation Models 704 | Bryant Chen*, UCLA 705 | 706 | Adaptive Skills Adaptive Partitions (ASAP) 707 | Daniel Mankowitz*, Technion; Timothy Mann, Google DeepMind; Shie Mannor, Technion 708 | 709 | Multiple-Play Bandits in the Position-Based Model 710 | Paul Lagrée*, Université Paris Sud; Claire Vernade, Université Paris Saclay; Olivier Cappe, 711 | 712 | Optimal Black-Box Reductions Between Optimization Objectives 713 | Zeyuan Allen-Zhu*, Princeton University; Elad Hazan, 714 | 715 | On Valid Optimal Assignment Kernels and Applications to Graph Classification 716 | Nils Kriege*, TU Dortmund; Pierre-Louis Giscard, University of York; Richard Wilson, University of York 717 | 718 | Robustness of classifiers: from adversarial to random noise 719 | Alhussein Fawzi, ; Seyed-Mohsen Moosavi-Dezfooli*, EPFL; Pascal Frossard, EPFL 720 | 721 | A Non-convex One-Pass Framework for Factorization Machines and Rank-One Matrix Sensing 722 | Ming Lin*, ; Jieping Ye, 723 | 724 | Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters 725 | Zeyuan Allen-Zhu*, Princeton University; Yang Yuan, Cornell University; Karthik Sridharan, University of Pennsylvania 726 | 727 | Combinatorial Multi-Armed Bandit with General Reward Functions 728 | Wei Chen*, ; Wei Hu, Princeton University; Fu Li, The University of Texas at Austin; Jian Li, Tsinghua University; Yu Liu, Tsinghua University; Pinyan Lu, Shanghai University of Finance and Economics 729 | 730 | Boosting with Abstention 731 | Corinna Cortes, ; Giulia DeSalvo*, ; Mehryar Mohri, 732 | 733 | Regret of Queueing Bandits 734 | Subhashini Krishnasamy, The University of Texas at Austin; Rajat Sen, The University of Texas at Austin; Ramesh Johari, ; Sanjay Shakkottai*, The University of Texas at Aus 735 | 736 | Deep Learning Games 737 | Dale Schuurmans*, ; Martin Zinkevich, Google 738 | 739 | Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods 740 | Antoine Gautier*, Saarland University; Quynh Nguyen, Saarland University; Matthias Hein, Saarland University 741 | 742 | Learning Volumetric 3D Object Reconstruction from Single-View with Projective Transformations 743 | Xinchen Yan*, University of Michigan; Jimei Yang, ; Ersin Yumer, Adobe Research; Yijie Guo, University of Michigan; Honglak Lee, University of Michigan 744 | 745 | A Credit Assignment Compiler for Joint Prediction 746 | Kai-Wei Chang*, ; He He, University of Maryland; Stephane Ross, Google; Hal III, ; John Langford, 747 | 748 | Accelerating Stochastic Composition Optimization 749 | Mengdi Wang*, ; Ji Liu, 750 | 751 | Reward Augmented Maximum Likelihood for Neural Structured Prediction 752 | Mohammad Norouzi*, ; Dale Schuurmans, ; Samy Bengio, ; zhifeng Chen, ; Navdeep Jaitly, ; Mike Schuster, ; Yonghui Wu, 753 | 754 | Consistent Kernel Mean Estimation for Functions of Random Variables 755 | Adam Scibior*, University of Cambridge; Carl-Johann Simon-Gabriel, MPI Tuebingen; Iliya Tolstikhin, ; Bernhard Schoelkopf, 756 | 757 | Towards Unifying Hamiltonian Monte Carlo and Slice Sampling 758 | Yizhe Zhang*, Duke university; Xiangyu Wang, Duke University; Changyou Chen, ; Ricardo Henao, ; Kai Fan, Duke university; Lawrence Carin, 759 | 760 | Scalable Adaptive Stochastic Optimization Using Random Projections 761 | Gabriel Krummenacher*, ETH Zurich; Brian Mcwilliams, Disney Research; Yannic Kilcher, ETH Zurich; Joachim Buhmann, ETH Zurich; Nicolai Meinshausen, 762 | 763 | Variational Inference in Mixed Probabilistic Submodular Models 764 | Josip Djolonga, ETH Zurich; Sebastian Tschiatschek*, ETH Zurich; Andreas Krause, 765 | 766 | Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated 767 | Namrata Vaswani*, ; Han Guo, Iowa State University 768 | 769 | The Multi-fidelity Multi-armed Bandit 770 | Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Barnabas Poczos, ; Jeff Schneider, CMU 771 | 772 | Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm 773 | Kejun Huang*, University of Minnesota; Xiao Fu, University of Minnesota; Nicholas Sidiropoulos, University of Minnesota 774 | 775 | Bootstrap Model Aggregation for Distributed Statistical Learning 776 | JUN HAN, Dartmouth College; Qiang Liu*, 777 | 778 | A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification 779 | Steven Cheng-Xian Li*, UMass Amherst; Benjamin Marlin, 780 | 781 | A Bandit Framework for Strategic Regression 782 | Yang Liu*, Harvard University; Yiling Chen, 783 | 784 | Architectural Complexity Measures of Recurrent Neural Networks 785 | Saizheng Zhang*, University of Montreal; Yuhuai Wu, University of Toronto; Tong Che, IHES; Zhouhan Lin, University of Montreal; Roland Memisevic, University of Montreal; Ruslan Salakhutdinov, University of Toronto; Yoshua Bengio, U. Montreal 786 | 787 | Statistical Inference for Cluster Trees 788 | Jisu Kim*, Carnegie Mellon University; Yen-Chi Chen, Carnegie Mellon University; Sivaraman Balakrishnan, Carnegie Mellon University; Alessandro Rinaldo, Carnegie Mellon University; Larry Wasserman, Carnegie Mellon University 789 | 790 | Contextual-MDPs for PAC Reinforcement Learning with Rich Observations 791 | Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; John Langford, 792 | 793 | Improved Deep Metric Learning with Multi-class N-pair Loss Objective 794 | Kihyuk Sohn*, 795 | 796 | Only H is left: Near-tight Episodic PAC RL 797 | Christoph Dann*, Carnegie Mellon University; Emma Brunskill, Carnegie Mellon University 798 | 799 | Stacked Approximated Regression Machine: A Simple Deep Learning Approach 800 | Zhangyang Wang*, UIUC; Shiyu Chang, UIUC; Qing Ling, USTC; Shuai Huang, UW; Xia Hu, ; Honghui Shi, UIUC; Thomas Huang, UIUC 801 | 802 | Unsupervised Learning of Spoken Language with Visual Context 803 | David Harwath*, MIT CSAIL; Antonio Torralba, MIT CSAIL; James Glass, MIT CSAIL 804 | 805 | Low-Rank Regression with Tensor Responses 806 | Guillaume Rabusseau*, Aix-Marseille University; Hachem Kadri, 807 | 808 | PAC-Bayesian Theory Meets Bayesian Inference 809 | Pascal Germain*, ; Francis Bach, ; Alexandre Lacoste, ; Simon Lacoste-Julien, INRIA 810 | 811 | Data Poisoning Attacks on Factorization-Based Collaborative Filtering 812 | Bo Li*, Vanderbilt University; Yining Wang, Carnegie Mellon University; Aarti Singh, Carnegie Mellon University; yevgeniy Vorobeychik, Vanderbilt University 813 | 814 | Learned Region Sparsity and Diversity Also Predicts Visual Attention 815 | Zijun Wei*, Stony Brook; Hossein Adeli, ; Minh Hoai, ; Gregory Zelinsky, ; Dimitris Samaras, 816 | 817 | End-to-End Goal-Driven Web Navigation 818 | Rodrigo Frassetto Nogueira*, New York University; Kyunghyun Cho, University of Montreal 819 | 820 | Automated scalable segmentation of neurons from multispectral images 821 | Uygar Sümbül*, Columbia University; Douglas Roossien, University of Michigan, Ann Arbor; Dawen Cai, University of Michigan, Ann Arbor; John Cunningham, Columbia University; Liam Paninski, 822 | 823 | Privacy Odometers and Filters: Pay-as-you-Go Composition 824 | Ryan Rogers*, University of Pennsylvania; Salil Vadhan, Harvard University; Aaron Roth, ; Jonathan Robert Ullman, 825 | 826 | Minimax Estimation of Maximal Mean Discrepancy with Radial Kernels 827 | Iliya Tolstikhin*, ; Bharath Sriperumbudur, ; Bernhard Schoelkopf, 828 | 829 | Adaptive optimal training of animal behavior 830 | Ji Hyun Bak*, Princeton University; Jung Yoon Choi, ; Ilana Witten, ; Jonathan Pillow, 831 | 832 | Hierarchical Object Representation for Open-Ended Object Category Learning and Recognition 833 | Hamidreza Kasaei*, IEETA, University of Aveiro 834 | 835 | Relevant sparse codes with variational information bottleneck 836 | Matthew Chalk*, IST Austria; Olivier Marre, Institut de la vision; Gašper Tkačik, Institute of Science and Technology Austria 837 | 838 | Combinatorial Energy Learning for Image Segmentation 839 | Jeremy Maitin-Shepard*, Google; Viren Jain, Google; Michal Januszewski, Google; Peter Li, ; Pieter Abbeel, 840 | 841 | Orthogonal Random Features 842 | Felix Xinnan Yu*, ; Ananda Theertha Suresh, ; Krzysztof Choromanski, ; Dan Holtmann-Rice, ; Sanjiv Kumar, Google 843 | 844 | Fast Active Set Methods for Online Spike Inference from Calcium Imaging 845 | Johannes Friedrich*, Columbia University; Liam Paninski, 846 | 847 | Diffusion-Convolutional Neural Networks 848 | James Atwood*, UMass Amherst 849 | 850 | Bayesian latent structure discovery from multi-neuron recordings 851 | Scott Linderman*, ; Ryan Adams, ; Jonathan Pillow, 852 | 853 | A Probabilistic Programming Approach To Probabilistic Data Analysis 854 | Feras Saad*, MIT; Vikash Mansinghka, MIT 855 | 856 | A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics 857 | William Hoiles*, University of California, Los ; Mihaela Van Der Schaar, 858 | 859 | Inference by Reparameterization in Neural Population Codes 860 | RAJKUMAR VASUDEVA RAJU, Rice University; Xaq Pitkow*, 861 | 862 | Tensor Switching Networks 863 | Chuan-Yung Tsai*, ; Andrew Saxe, ; David Cox, 864 | 865 | Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo 866 | Alain Durmus, Telecom ParisTech; Umut Simsekli*, ; Eric Moulines, Ecole Polytechnique; Roland Badeau, Telecom ParisTech; Gaël Richard, Telecom ParisTech 867 | 868 | Coordinate-wise Power Method 869 | Qi Lei*, UT AUSTIN; Kai Zhong, UT AUSTIN; Inderjit Dhillon, 870 | 871 | Learning Influence Functions from Incomplete Observations 872 | Xinran He*, USC; Ke Xu, USC; David Kempe, USC; Yan Liu, 873 | 874 | Learning Structured Sparsity in Deep Neural Networks 875 | Wei Wen*, University of Pittsburgh; Chunpeng Wu, University of Pittsburgh; Yandan Wang, University of Pittsburgh; Yiran Chen, University of Pittsburgh; Hai Li, University of Pittsburg 876 | 877 | Sample Complexity of Automated Mechanism Design 878 | Nina Balcan, ; Tuomas Sandholm, Carnegie Mellon University; Ellen Vitercik*, Carnegie Mellon University 879 | 880 | Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products 881 | SANGHAMITRA DUTTA*, Carnegie Mellon University; Viveck Cadambe, Pennsylvania State University; Pulkit Grover, Carnegie Mellon University 882 | 883 | Brains on Beats 884 | Umut Güçlü*, Radboud University; Jordy Thielen, Radboud University; Michael Hanke, Otto-von-Guericke University Magdeburg; Marcel Van Gerven, Radboud University 885 | 886 | Learning Transferrable Representations for Unsupervised Domain Adaptation 887 | Ozan Sener*, Cornell University; Hyun Oh Song, Google Research; Ashutosh Saxena, Brain of Things; Silvio Savarese, Stanford University 888 | 889 | Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles 890 | Stefan Lee*, Indiana University; Senthil Purushwalkam, Carnegie Mellon; Michael Cogswell, Virginia Tech; Viresh Ranjan, Virginia Tech; David Crandall, Indiana University; Dhruv Batra, 891 | 892 | Active Learning from Imperfect Labelers 893 | Songbai Yan*, University of California, San Diego; Kamalika Chaudhuri, University of California, San Diego; Tara Javidi, University of California, San Diego 894 | 895 | Learning to Communicate with Deep Multi-Agent Reinforcement Learning 896 | Jakob Foerster*, University of Oxford; Yannis Assael, University of Oxford; Nando de Freitas, University of Oxford; Shimon Whiteson, 897 | 898 | Value Iteration Networks 899 | Aviv Tamar*, ; Sergey Levine, ; Pieter Abbeel, ; Yi Wu, UC Berkeley; Garrett Thomas, UC Berkeley 900 | 901 | Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering 902 | Dogyoon Song*, MIT; Christina Lee, MIT; Yihua Li, MIT; Devavrat Shah, 903 | 904 | On the Recursive Teaching Dimension of VC Classes 905 | Bo Tang*, University of Oxford; Xi Chen, Columbia University; Yu Cheng, U of Southern California 906 | 907 | InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets 908 | Xi Chen*, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; Rein Houthooft, Ghent University - iMinds; UC Berkeley; OpenAI; John Schulman, OpenAI; Ilya Sutskever, ; Pieter Abbeel, 909 | 910 | Hardness of Online Sleeping Combinatorial Optimization Problems 911 | Satyen Kale*, ; Chansoo Lee, ; David Pal, 912 | 913 | Mixed Linear Regression with Multiple Components 914 | Kai Zhong*, UT AUSTIN; Prateek Jain, Microsoft Research; Inderjit Dhillon, 915 | 916 | Sequential Neural Models with Stochastic Layers 917 | Marco Fraccaro*, DTU; Søren Sønderby, KU; Ulrich Paquet, ; Ole Winther, DTU 918 | 919 | Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences 920 | Hongseok Namkoong*, Stanford University; John Duchi, 921 | 922 | Minimizing Quadratic Functions in Constant Time 923 | Kohei Hayashi*, AIST; Yuichi Yoshida, NII 924 | 925 | Improved Techniques for Training GANs 926 | Tim Salimans*, ; Ian Goodfellow, OpenAI; Wojciech Zaremba, OpenAI; Vicki Cheung, OpenAI; Alec Radford, OpenAI; Xi Chen, UC Berkeley; OpenAI 927 | 928 | DeepMath - Deep Sequence Models for Premise Selection 929 | Geoffrey Irving*, ; Christian Szegedy, ; Alexander Alemi, Google; Francois Chollet, ; Josef Urban, Czech Technical University in Prague 930 | 931 | Learning Multiagent Communication with Backpropagation 932 | Sainbayar Sukhbaatar, NYU; Arthur Szlam, ; Rob Fergus*, New York University 933 | Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity 934 | Amit Daniely*, ; Roy Frostig, Stanford University; Yoram Singer, Google 935 | 936 | Learning the Number of Neurons in Deep Networks 937 | Jose Alvarez*, NICTA; Mathieu Salzmann, EPFL 938 | 939 | Finding significant combinations of features in the presence of categorical covariates 940 | Laetitia Papaxanthos*, ETH Zurich; Felipe Llinares, ETH Zurich; Dean Bodenham, ETH Zurich; Karsten Borgwardt, 941 | 942 | Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning 943 | Been Kim*, ; Rajiv Khanna, UT Austin; Sanmi Koyejo, UIUC 944 | 945 | Optimistic Bandit Convex Optimization 946 | Scott Yang*, New York University; Mehryar Mohri, 947 | 948 | Safe Policy Improvement by Minimizing Robust Baseline Regret 949 | Mohamad Ghavamzadeh*, ; Marek Petrik, ; Yinlam Chow, Stanford University 950 | 951 | Graphons, mergeons, and so on! 952 | Justin Eldridge*, The Ohio State University; Mikhail Belkin, ; Yusu Wang, The Ohio State University 953 | 954 | Hierarchical Clustering via Spreading Metrics 955 | Aurko Roy*, Georgia Tech; Sebastian Pokutta, GeorgiaTech 956 | 957 | Learning Bayesian networks with ancestral constraints 958 | Eunice Yuh-Jie Chen*, UCLA; Yujia Shen, ; Arthur Choi, ; Adnan Darwiche, 959 | 960 | Pruning Random Forests for Prediction on a Budget 961 | Feng Nan*, Boston University; Joseph Wang, Boston University; Venkatesh Saligrama, 962 | 963 | Clustering with Bregman Divergences: an Asymptotic Analysis 964 | Chaoyue Liu*, The Ohio State University; Mikhail Belkin, 965 | 966 | Variational Autoencoder for Deep Learning of Images, Labels and Captions 967 | Yunchen Pu*, Duke University; Zhe Gan, Duke; Ricardo Henao, ; Xin Yuan, Bell Labs; chunyuan Li, Duke; Andrew Stevens, Duke University; Lawrence Carin, 968 | 969 | Encode, Review, and Decode: Reviewer Module for Caption Generation 970 | Zhilin Yang*, Carnegie Mellon University; Ye Yuan, Carnegie Mellon University; Yuexin Wu, Carnegie Mellon University; William Cohen, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto 971 | 972 | Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm 973 | Qiang Liu*, ; Dilin Wang, Dartmouth College 974 | 975 | A Bio-inspired Redundant Sensing Architecture 976 | Anh Tuan Nguyen*, University of Minnesota; Jian Xu, University of Minnesota; Zhi Yang, University of Minnesota 977 | 978 | Contextual semibandits via supervised learning oracles 979 | Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; Miro Dudik, 980 | 981 | Blind Attacks on Machine Learners 982 | Alex Beatson*, Princeton University; Zhaoran Wang, Princeton University; Han Liu, 983 | 984 | Universal Correspondence Network 985 | Christopher Choy*, Stanford University; Manmohan Chandraker, NEC Labs America; JunYoung Gwak, Stanford University; Silvio Savarese, Stanford University 986 | 987 | Satisfying Real-world Goals with Dataset Constraints 988 | Gabriel Goh*, UC Davis; Andy Cotter, ; Maya Gupta, ; Michael Friedlander, UC Davis 989 | 990 | Deep Learning for Predicting Human Strategic Behavior 991 | Jason Hartford*, University of British Columbia; Kevin Leyton-Brown, ; James Wright, University of British Columbia 992 | 993 | Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games 994 | Sougata Chaudhuri*, University of Michigan ; Ambuj Tewari, University of Michigan 995 | 996 | Eliciting and Aggregating Categorical Data 997 | Yiling Chen, ; Rafael Frongillo, ; Chien-Ju Ho*, 998 | 999 | Measuring the reliability of MCMC inference with Bidirectional Monte Carlo 1000 | Roger Grosse, ; Siddharth Ancha, University of Toronto; Daniel Roy*, 1001 | 1002 | Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation 1003 | Weihao Gao, UIUC; Sewoong Oh*, ; Pramod Viswanath, UIUC 1004 | 1005 | Selective inference for group-sparse linear models 1006 | Fan Yang, University of Chicago; Rina Foygel Barber*, ; Prateek Jain, Microsoft Research; John Lafferty, 1007 | 1008 | Graph Clustering: Block-models and model free results 1009 | Yali Wan*, University of Washington; Marina Meila, University of Washington 1010 | 1011 | Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution 1012 | Christopher Lynn*, University of Pennsylvania; Dan Lee , University of Pennsylvania 1013 | 1014 | Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Neuroscience 1015 | Hao Zhou, University of Wisconsin Madiso; Vamsi Ithapu*, University of Wisconsin Madison; Sathya Ravi, University of Wisconsin Madiso; Vikas Singh, UW Madison; Grace Wahba, University of Wisconsin Madison; Sterling Johnson, University of Wisconsin Madison 1016 | 1017 | Geometric Dirichlet Means Algorithm for Topic Inference 1018 | Mikhail Yurochkin*, University of Michigan; Long Nguyen, 1019 | 1020 | Structured Prediction Theory Based on Factor Graph Complexity 1021 | Corinna Cortes, ; Vitaly Kuznetsov*, Courant Institute; Mehryar Mohri, ; Scott Yang, New York University 1022 | 1023 | Improved Dropout for Shallow and Deep Learning 1024 | Zhe Li, The University of Iowa; Boqing Gong, University of Central Florida; Tianbao Yang*, University of Iowa 1025 | 1026 | Constraints Based Convex Belief Propagation 1027 | Yaniv Tenzer*, The Hebrew University; Alexander Schwing, ; Kevin Gimpel, ; Tamir Hazan, 1028 | 1029 | Error Analysis of Generalized Nyström Kernel Regression 1030 | Hong Chen, University of Texas; Haifeng Xia, Huazhong Agricultural University; Heng Huang*, University of Texas Arlington 1031 | 1032 | A Probabilistic Framework for Deep Learning 1033 | Ankit Patel, Baylor College of Medicine; Rice University; Tan Nguyen*, Rice University; Richard Baraniuk, 1034 | 1035 | General Tensor Spectral Co-clustering for Higher-Order Data 1036 | Tao Wu*, Purdue University; Austin Benson, Stanford University; David Gleich, 1037 | 1038 | Cyclades: Conflict-free Asynchronous Machine Learning 1039 | Xinghao Pan*, UC Berkeley; Stephen Tu, UC Berkeley; Maximilian Lam, UC Berkeley; Dimitris Papailiopoulos, ; Ce Zhang, Stanford; Michael Jordan, ; Kannan Ramchandran, ; Christopher Re, ; Ben Recht, 1040 | 1041 | Single Pass PCA of Matrix Products 1042 | Shanshan Wu*, UT Austin; Srinadh Bhojanapalli, TTI Chicago; Sujay Sanghavi, ; Alexandros G. Dimakis, 1043 | 1044 | Stochastic Variational Deep Kernel Learning 1045 | Andrew Wilson*, Carnegie Mellon University; Zhiting Hu, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto; Eric Xing, Carnegie Mellon University 1046 | 1047 | Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models 1048 | Marc Vuffray*, Los Alamos National Laboratory; Sidhant Misra, Los Alamos National Laboratory; Andrey Lokhov, Los Alamos National Laboratory; Misha Chertkov, Los Alamos National Laboratory 1049 | 1050 | Long-term Causal Effects via Behavioral Game Theory 1051 | Panos Toulis*, University of Chicago; David Parkes, Harvard University 1052 | 1053 | Measuring Neural Net Robustness with Constraints 1054 | Osbert Bastani*, Stanford University; Yani Ioannou, University of Cambridge; Leonidas Lampropoulos, University of Pennsylvania; Dimitrios Vytiniotis, Microsoft Research; Aditya Nori, Microsoft Research; Antonio Criminisi, 1055 | 1056 | Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations 1057 | Huishuai Zhang*, Syracuse University; Yingbin Liang, Syracuse University 1058 | 1059 | Nearly Isometric Embedding by Relaxation 1060 | James McQueen*, University of Washington; Marina Meila, University of Washington; Dominique Joncas, Google 1061 | 1062 | Probabilistic Inference with Generating Functions for Poisson Latent Variable Models 1063 | Kevin Winner*, UMass CICS; Daniel Sheldon, 1064 | 1065 | Causal meets Submodular: Subset Selection with Directed Information 1066 | Yuxun Zhou*, UC Berkeley; Costas Spanos, 1067 | 1068 | Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions 1069 | Ayan Chakrabarti*, ; Jingyu Shao, UCLA; Greg Shakhnarovich, 1070 | 1071 | Deep Neural Networks with Inexact Matching for Person Re-Identification 1072 | Arulkumar Subramaniam, IIT Madras; Moitreya Chatterjee*, IIT Madras; Anurag Mittal, IIT Madras 1073 | 1074 | Global Analysis of Expectation Maximization for Mixtures of Two Gaussians 1075 | Ji Xu, Columbia university; Daniel Hsu*, ; Arian Maleki, Columbia University 1076 | 1077 | Estimating the class prior and posterior from noisy positives and unlabeled data 1078 | Shanatnu Jain*, Indiana University; Martha White, ; Predrag Radivojac, 1079 | 1080 | Kronecker Determinantal Point Processes 1081 | Zelda Mariet*, MIT; Suvrit Sra, MIT 1082 | 1083 | Finite Sample Prediction and Recovery Bounds for Ordinal Embedding 1084 | Lalit Jain*, University of Wisconsin-Madison; Kevin Jamieson, UC Berkeley; Robert Nowak, University of Wisconsin Madison 1085 | 1086 | Feature-distributed sparse regression: a screen-and-clean approach 1087 | Jiyan Yang*, Stanford University; Michael Mahoney, ; Michael Saunders, Stanford University; Yuekai Sun, University of Michigan 1088 | 1089 | Learning Bound for Parameter Transfer Learning 1090 | Wataru Kumagai*, Kanagawa University 1091 | 1092 | Learning under uncertainty: a comparison between R-W and Bayesian approach 1093 | He Huang*, LIBR; Martin Paulus, LIBR 1094 | 1095 | Bi-Objective Online Matching and Submodular Allocations 1096 | Hossein Esfandiari*, University of Maryland; Nitish Korula, Google Research; Vahab Mirrokni, Google 1097 | 1098 | Quantized Random Projections and Non-Linear Estimation of Cosine Similarity 1099 | Ping Li, ; Michael Mitzenmacher, Harvard University; Martin Slawski*, 1100 | 1101 | The non-convex Burer-Monteiro approach works on smooth semidefinite programs 1102 | Nicolas Boumal, ; Vlad Voroninski*, MIT; Afonso Bandeira, 1103 | 1104 | Dimensionality Reduction of Massive Sparse Datasets Using Coresets 1105 | Dan Feldman, ; Mikhail Volkov*, MIT; Daniela Rus, MIT 1106 | 1107 | Using Social Dynamics to Make Individual Predictions: Variational Inference with Stochastic Kinetic Model 1108 | Zhen Xu*, SUNY at Buffalo; Wen Dong, ; Sargur Srihari, 1109 | 1110 | Supervised learning through the lens of compression 1111 | Ofir David*, Technion - Israel institute of technology; Shay Moran, Technion - Israel institue of Technology; Amir Yehudayoff, Technion - Israel institue of Technology 1112 | 1113 | Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data 1114 | Xinghua Lou*, Vicarious FPC Inc; Ken Kansky, ; Wolfgang Lehrach, ; CC Laan, ; Bhaskara Marthi, ; D. Scott Phoenix, ; Dileep George, 1115 | 1116 | Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections 1117 | Xiao-Jiao Mao, Nanjing University; Chunhua Shen*, ; Yu-Bin Yang, 1118 | 1119 | Object based Scene Representations using Fisher Scores of Local Subspace Projections 1120 | Mandar Dixit*, UC San Diego; Nuno Vasconcelos, 1121 | 1122 | Active Learning with Oracle Epiphany 1123 | Tzu-Kuo Huang, Microsoft Research; Lihong Li, Microsoft Research; Ara Vartanian, University of Wisconsin-Madison; Saleema Amershi, Microsoft; Xiaojin Zhu*, 1124 | 1125 | Statistical Inference for Pairwise Graphical Models Using Score Matching 1126 | Ming Yu*, The University of Chicago; Mladen Kolar, ; Varun Gupta, University of Chicago 1127 | 1128 | Improved Error Bounds for Tree Representations of Metric Spaces 1129 | Samir Chowdhury*, The Ohio State University; Facundo Memoli, ; Zane Smith, 1130 | 1131 | Can Peripheral Representations Improve Clutter Metrics on Complex Scenes? 1132 | Arturo Deza*, UCSB; Miguel Eckstein, UCSB 1133 | 1134 | On Multiplicative Integration with Recurrent Neural Networks 1135 | Yuhuai Wu*, University of Toronto; Saizheng Zhang, University of Montreal; ying Zhang, University of Montreal; Yoshua Bengio, U. Montreal; Ruslan Salakhutdinov, University of Toronto 1136 | 1137 | Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices 1138 | Kirthevasan Kandasamy*, CMU; Maruan Al-Shedivat, CMU; Eric Xing, Carnegie Mellon University 1139 | 1140 | Regret Bounds for Non-decomposable Metrics with Missing Labels 1141 | Nagarajan Natarajan*, Microsoft Research Bangalore; Prateek Jain, Microsoft Research 1142 | 1143 | Robust k-means: a Theoretical Revisit 1144 | ALEXANDROS GEORGOGIANNIS*, TECHNICAL UNIVERSITY OF CRETE 1145 | 1146 | Bayesian optimization for automated model selection 1147 | Gustavo Malkomes, Washington University; Charles Schaff, Washington University in St. Louis; Roman Garnett*, 1148 | 1149 | A Probabilistic Model of Social Decision Making based on Reward Maximization 1150 | Koosha Khalvati*, University of Washington; Seongmin Park, Cognitive Neuroscience Center; Jean-Claude Dreher, Centre de Neurosciences Cognitives; Rajesh Rao, University of Washington 1151 | 1152 | Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition 1153 | Ahmed Alaa*, UCLA; Mihaela Van Der Schaar, 1154 | 1155 | Fast and Flexible Monotonic Functions with Ensembles of Lattices 1156 | Mahdi Fard, ; Kevin Canini, ; Andy Cotter, ; Jan Pfeifer, Google; Maya Gupta*, 1157 | 1158 | Conditional Generative Moment-Matching Networks 1159 | Yong Ren, Tsinghua University; Jun Zhu*, ; Jialian Li, Tsinghua University; Yucen Luo, 1160 | 1161 | Stochastic Gradient MCMC with Stale Gradients 1162 | Changyou Chen*, ; Nan Ding, Google; chunyuan Li, Duke; Yizhe Zhang, Duke university; Lawrence Carin, 1163 | 1164 | Composing graphical models with neural networks for structured representations and fast inference 1165 | Matthew Johnson, ; David Duvenaud*, ; Alex Wiltschko, Harvard University and Twitter; Ryan Adams, ; Sandeep Datta, Harvard Medical School 1166 | 1167 | Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling 1168 | Nina Balcan, ; Hongyang Zhang*, CMU 1169 | 1170 | Combinatorial semi-bandit with known covariance 1171 | Rémy Degenne*, Université Paris Diderot; Vianney Perchet, 1172 | 1173 | Matrix Completion has No Spurious Local Minimum 1174 | Rong Ge, ; Jason Lee, UC Berkeley; Tengyu Ma*, Princeton University 1175 | 1176 | The Multiscale Laplacian Graph Kernel 1177 | Risi Kondor*, ; Horace Pan, UChicago 1178 | 1179 | Adaptive Averaging in Accelerated Descent Dynamics 1180 | Walid Krichene*, UC Berkeley; Alexandre Bayen, UC Berkeley; Peter Bartlett, 1181 | 1182 | Sub-sampled Newton Methods with Non-uniform Sampling 1183 | Peng Xu*, Stanford University; Jiyan Yang, Stanford University; Farbod Roosta-Khorasani, University of California Berkeley; Christopher Re, ; Michael Mahoney, 1184 | 1185 | Stochastic Gradient Geodesic MCMC Methods 1186 | Chang Liu*, Tsinghua University; Jun Zhu, ; Yang Song, Stanford University 1187 | 1188 | Variational Bayes on Monte Carlo Steroids 1189 | Aditya Grover*, Stanford University; Stefano Ermon, 1190 | 1191 | Showing versus doing: Teaching by demonstration 1192 | Mark Ho*, Brown University; Michael L. Littman, ; James MacGlashan, Brown University; Fiery Cushman, Harvard University; Joe Austerweil, 1193 | 1194 | Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation 1195 | Jianxu Chen*, University of Notre Dame; Lin Yang, University of Notre Dame; Yizhe Zhang, University of Notre Dame; Mark Alber, University of Notre Dame; Danny Chen, University of Notre Dame 1196 | 1197 | Maximization of Approximately Submodular Functions 1198 | Thibaut Horel*, Harvard University; Yaron Singer, 1199 | 1200 | A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order 1201 | Xiangru Lian, University of Rochester; Huan Zhang, ; Cho-Jui Hsieh, ; Yijun Huang, ; Ji Liu*, 1202 | 1203 | Learning Infinite RBMs with Frank-Wolfe 1204 | Wei Ping*, UC Irvine; Qiang Liu, ; Alexander Ihler, 1205 | 1206 | Estimating the Size of a Large Network and its Communities from a Random Sample 1207 | Lin Chen*, Yale University; Amin Karbasi, ; Forrest Crawford, Yale University 1208 | 1209 | Learning Sensor Multiplexing Design through Back-propagation 1210 | Ayan Chakrabarti*, 1211 | 1212 | On Robustness of Kernel Clustering 1213 | Bowei Yan*, University of Texas at Austin; Purnamrita Sarkar, U.C. Berkeley 1214 | 1215 | High resolution neural connectivity from incomplete tracing data using nonnegative spline regression 1216 | Kameron Harris*, University of Washington; Stefan Mihalas, Allen Institute for Brain Science; Eric Shea-Brown, University of Washington 1217 | 1218 | MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild 1219 | Gregory Rogez*, Inria; Cordelia Schmid, 1220 | 1221 | A New Liftable Class for First-Order Probabilistic Inference 1222 | Seyed Mehran Kazemi*, UBC; Angelika Kimmig, KU Leuven; Guy Van den Broeck, ; David Poole, UBC 1223 | 1224 | The Parallel Knowledge Gradient Method for Batch Bayesian Optimization 1225 | Jian Wu*, Cornell University; Peter I. Frazier, 1226 | 1227 | Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits 1228 | Vasilis Syrgkanis*, ; Haipeng Luo, Princeton University; Akshay Krishnamurthy, ; Robert Schapire, 1229 | 1230 | Consistent Estimation of Functions of Data Missing Non-Monotonically and Not at Random 1231 | Ilya Shpitser*, 1232 | 1233 | Optimistic Gittins Indices 1234 | Eli Gutin*, Massachusetts Institute of Tec; Vivek Farias, 1235 | 1236 | Finite-Dimensional BFRY Priors and Variational Bayesian Inference for Power Law Models 1237 | Juho Lee*, POSTECH; Lancelot James, HKUST; Seungjin Choi, POSTECH 1238 | 1239 | Launch and Iterate: Reducing Prediction Churn 1240 | Mahdi Fard, ; Quentin Cormier, Google; Kevin Canini, ; Maya Gupta*, 1241 | 1242 | “Congruent” and “Opposite” Neurons: Sisters for Multisensory Integration and Segregation 1243 | Wen-Hao Zhang*, Institute of Neuroscience, Chinese Academy of Sciences; He Wang, HKUST; K. Y. Michael Wong, HKUST; Si Wu, 1244 | 1245 | Learning shape correspondence with anisotropic convolutional neural networks 1246 | Davide Boscaini*, University of Lugano; Jonathan Masci, ; Emanuele Rodolà, University of Lugano; Michael Bronstein, University of Lugano 1247 | 1248 | Pairwise Choice Markov Chains 1249 | Stephen Ragain*, Stanford University; Johan Ugander, 1250 | 1251 | NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization 1252 | Davood Hajinezhad*, Iowa State University; Mingyi Hong, ; Tuo Zhao, Johns Hopkins University; Zhaoran Wang, Princeton University 1253 | 1254 | Clustering with Same-Cluster Queries 1255 | Hassan Ashtiani, University of Waterloo; Shrinu Kushagra*, University of Waterloo; Shai Ben-David, U. Waterloo 1256 | 1257 | Attend, Infer, Repeat: Fast Scene Understanding with Generative Models 1258 | S. M. Ali Eslami*, Google DeepMind; Nicolas Heess, ; Theophane Weber, ; Yuval Tassa, Google DeepMind; David Szepesvari, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Geoffrey Hinton, Google 1259 | 1260 | Parameter Learning for Log-supermodular Distributions 1261 | Tatiana Shpakova*, Inria - ENS Paris; Francis Bach, 1262 | 1263 | Deconvolving Feedback Loops in Recommender Systems 1264 | Ayan Sinha*, Purdue; David Gleich, ; Karthik Ramani, Purdue University 1265 | 1266 | Structured Matrix Recovery via the Generalized Dantzig Selector 1267 | Sheng Chen*, University of Minnesota; Arindam Banerjee, 1268 | 1269 | Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making 1270 | Himabindu Lakkaraju*, Stanford University; Jure Leskovec, 1271 | 1272 | Automatic Neuron Detection in Calcium Imaging Data Using Convolutional Networks 1273 | Noah Apthorpe*, Princeton University; Alexander Riordan, Princeton University; Robert Aguilar, Princeton University; Jan Homann, Princeton University; Yi Gu, Princeton University; David Tank, Princeton University; H. Sebastian Seung, Princeton University 1274 | 1275 | Designing smoothing functions for improved worst-case competitive ratio in online optimization 1276 | Reza Eghbali*, University of washington; Maryam Fazel, University of Washington 1277 | 1278 | Convergence guarantees for kernel-based quadrature rules in misspecified settings 1279 | Motonobu Kanagawa*, ; Bharath Sriperumbudur, ; Kenji Fukumizu, 1280 | 1281 | Unsupervised Learning from Noisy Networks with Applications to Hi-C Data 1282 | Bo Wang*, Stanford University; Junjie Zhu, Stanford University; Armin Pourshafeie, Stanford University 1283 | 1284 | A non-generative theory for unsupervised learning and efficient improper dictionary learning 1285 | Elad Hazan, ; Tengyu Ma*, Princeton University 1286 | 1287 | Equality of Opportunity in Supervised Learning 1288 | Moritz Hardt*, ; Eric Price, ; Nathan Srebro, 1289 | 1290 | Scaled Least Squares Estimator for GLMs in Large-Scale Problems 1291 | Murat Erdogdu*, Stanford University; Lee Dicker, ; Mohsen Bayati, 1292 | 1293 | Interpretable Nonlinear Dynamic Modeling of Neural Trajectories 1294 | Yuan Zhao*, Stony Brook University; Il Memming Park, 1295 | 1296 | Search Improves Label for Active Learning 1297 | Alina Beygelzimer, Yahoo Inc; Daniel Hsu, ; John Langford, ; Chicheng Zhang*, UCSD 1298 | 1299 | Higher-Order Factorization Machines 1300 | Mathieu Blondel*, NTT; Akinori Fujino, NTT; Naonori Ueda, ; Masakazu Ishihata, Hokkaido University 1301 | 1302 | Exponential expressivity in deep neural networks through transient chaos 1303 | Ben Poole*, Stanford University; Subhaneil Lahiri, Stanford University; Maithra Raghu, Cornell University; Jascha Sohl-Dickstein, ; Surya Ganguli, Stanford 1304 | 1305 | Split LBI: An Iterative Regularization Path with Structural Sparsity 1306 | Chendi Huang, Peking University; Xinwei Sun, ; Jiechao Xiong, Peking University; Yuan Yao*, 1307 | 1308 | An equivalence between high dimensional Bayes optimal inference and M-estimation 1309 | Madhu Advani*, Stanford University; Surya Ganguli, Stanford 1310 | 1311 | Synthesizing the preferred inputs for neurons in neural networks via deep generator networks 1312 | Anh Nguyen*, University of Wyoming; Alexey Dosovitskiy, ; Jason Yosinski, Cornell; Thomas Brox, University of Freiburg; Jeff Clune, 1313 | 1314 | Deep Submodular Functions 1315 | Brian Dolhansky*, University of Washington; Jeff Bilmes, University of Washington, Seattle 1316 | 1317 | Discriminative Gaifman Models 1318 | Mathias Niepert*, 1319 | 1320 | Leveraging Sparsity for Efficient Submodular Data Summarization 1321 | Erik Lindgren*, University of Texas at Austin; Shanshan Wu, UT Austin; Alexandros G. Dimakis, 1322 | 1323 | Local Minimax Complexity of Stochastic Convex Optimization 1324 | Sabyasachi Chatterjee, University of Chicago; John Duchi, ; John Lafferty, ; Yuancheng Zhu*, University of Chicago 1325 | 1326 | Stochastic Optimization for Large-scale Optimal Transport 1327 | Aude Genevay*, Université Paris Dauphine; Marco Cuturi, ; Gabriel Peyré, ; Francis Bach, 1328 | 1329 | On Mixtures of Markov Chains 1330 | Rishi Gupta*, Stanford; Ravi Kumar, ; Sergei Vassilvitskii, Google 1331 | 1332 | Linear Contextual Bandits with Knapsacks 1333 | Shipra Agrawal*, ; Nikhil Devanur, Microsoft Research 1334 | 1335 | Reconstructing Parameters of Spreading Models from Partial Observations 1336 | Andrey Lokhov*, Los Alamos National Laboratory 1337 | 1338 | Spatiotemporal Residual Networksfor Video Action Recognition 1339 | Christoph Feichtenhofer*, Graz University of Technology; Axel Pinz, Graz University of Technology; Richard Wildes, York University Toronto 1340 | 1341 | Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations 1342 | Behnam Neyshabur*, TTI-Chicago; Yuhuai Wu, University of Toronto; Ruslan Salakhutdinov, University of Toronto; Nathan Srebro, 1343 | 1344 | Strategic Attentive Writer for Learning Macro-Actions 1345 | Alexander Vezhnevets*, Google DeepMind; Volodymyr Mnih, ; Simon Osindero, Google DeepMind; Alex Graves, ; Oriol Vinyals, ; John Agapiou, ; Koray Kavukcuoglu, Google DeepMind 1346 | 1347 | The Limits of Learning with Missing Data 1348 | Brian Bullins*, Princeton University; Elad Hazan, ; Tomer Koren, Technion---Israel Inst. of Technology 1349 | 1350 | RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism 1351 | Edward Choi*, Georgia Institute of Technolog; Mohammad Taha Bahadori, Gatech; Jimeng Sun, 1352 | 1353 | Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers 1354 | Yu-Xiang Wang*, Carnegie Mellon University; Veeranjaneyulu Sadhanala, Carnegie Mellon University; Ryan Tibshirani, 1355 | 1356 | Community Detection on Evolving Graphs 1357 | Stefano Leonardi*, Sapienza University of Rome; Aris Anagnostopoulos, Sapienza University of Rome; Jakub Łącki, Sapienza University of Rome; Silvio Lattanzi, Google; Mohammad Mahdian, Google Research, New York 1358 | 1359 | Online and Differentially-Private Tensor Decomposition 1360 | Yining Wang*, Carnegie Mellon University; Anima Anandkumar, UC Irvine 1361 | 1362 | Dimension-Free Iteration Complexity of Finite Sum Optimization Problems 1363 | Yossi Arjevani*, Weizmann Institute of Science; Ohad Shamir, Weizmann Institute of Science 1364 | 1365 | Towards Conceptual Compression 1366 | Karol Gregor*, ; Frederic Besse, Google DeepMind; Danilo Jimenez Rezende, ; Ivo Danihelka, ; Daan Wierstra, Google DeepMind 1367 | 1368 | Exact Recovery of Hard Thresholding Pursuit 1369 | Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang, 1370 | 1371 | Data Programming: Creating Large Training Sets, Quickly 1372 | Alexander Ratner*, Stanford University; Christopher De Sa, Stanford University; Sen Wu, Stanford University; Daniel Selsam, Stanford; Christopher Ré, Stanford University 1373 | 1374 | Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back 1375 | Vitaly Feldman*, 1376 | 1377 | Dynamic matrix recovery from incomplete observations under an exact low-rank constraint 1378 | Liangbei Xu*, Gatech; Mark Davenport, 1379 | 1380 | Fast Distributed Submodular Cover: Public-Private Data Summarization 1381 | Baharan Mirzasoleiman*, ETH Zurich; Morteza Zadimoghaddam, ; Amin Karbasi, 1382 | 1383 | Estimating Nonlinear Neural Response Functions using GP Priors and Kronecker Methods 1384 | Cristina Savin*, IST Austria; Gašper Tkačik, Institute of Science and Technology Austria 1385 | 1386 | Lifelong Learning with Weighted Majority Votes 1387 | Anastasia Pentina*, IST Austria; Ruth Urner, MPI Tuebingen 1388 | 1389 | Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes 1390 | Jack Rae*, Google DeepMind; Jonathan Hunt, ; Ivo Danihelka, ; Tim Harley, Google DeepMind; Andrew Senior, ; Greg Wayne, ; Alex Graves, ; Timothy Lillicrap, Google DeepMind 1391 | 1392 | Matching Networks for One Shot Learning 1393 | Oriol Vinyals*, ; Charles Blundell, DeepMind; Timothy Lillicrap, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Daan Wierstra, Google DeepMind 1394 | 1395 | Tight Complexity Bounds for Optimizing Composite Objectives 1396 | Blake Woodworth*, Toyota Technological Institute; Nathan Srebro, 1397 | 1398 | Graphical Time Warping for Joint Alignment of Multiple Curves 1399 | Yizhi Wang, Virginia Tech; David Miller, The Pennsylvania State University; Kira Poskanzer, University of California, San Francisco; Yue Wang, Virginia Tech; Lin Tian, The University of California, Davis; Guoqiang Yu*, 1400 | 1401 | Unsupervised Risk Estimation Using Only Conditional Independence Structure 1402 | Jacob Steinhardt*, Stanford University; Percy Liang, 1403 | 1404 | MetaGrad: Multiple Learning Rates in Online Learning 1405 | Tim Van Erven*, ; Wouter M. Koolen, 1406 | 1407 | Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation 1408 | Tejas Kulkarni, MIT; Karthik Narasimhan*, MIT; Ardavan Saeedi, MIT; Joshua Tenenbaum, 1409 | 1410 | High Dimensional Structured Superposition Models 1411 | Qilong Gu*, University of Minnesota; Arindam Banerjee, 1412 | 1413 | Joint quantile regression in vector-valued RKHSs 1414 | Maxime Sangnier*, LTCI, CNRS, Télécom ParisTech; Olivier Fercoq, ; Florence d’Alché-Buc, 1415 | 1416 | The Forget-me-not Process 1417 | Kieran Milan, Google DeepMind; Joel Veness*, ; James Kirkpatrick, Google DeepMind; Michael Bowling, ; Anna Koop, University of Alberta; Demis Hassabis, 1418 | 1419 | Wasserstein Training of Restricted Boltzmann Machines 1420 | Gregoire Montavon*, ; Klaus-Robert Muller, ; Marco Cuturi, 1421 | 1422 | Communication-Optimal Distributed Clustering 1423 | Jiecao Chen, Indiana University Bloomington; He Sun*, The University of Bristol; David Woodruff, ; Qin Zhang, 1424 | 1425 | Probing the Compositionality of Intuitive Functions 1426 | Eric Schulz*, University College London; Joshua Tenenbaum, ; David Duvenaud, ; Maarten Speekenbrink, University College London; Sam Gershman, 1427 | 1428 | Ladder Variational Autoencoders 1429 | Casper Kaae Sønderby*, University of Copenhagen; Tapani Raiko, ; Lars Maaløe, Technical University of Denmark; Søren Sønderby, KU; Ole Winther, Technical University of Denmark 1430 | 1431 | The Multiple Quantile Graphical Model 1432 | Alnur Ali*, Carnegie Mellon University; Zico Kolter, ; Ryan Tibshirani, 1433 | 1434 | Threshold Learning for Optimal Decision Making 1435 | Nathan Lepora*, University of Bristol 1436 | 1437 | Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA 1438 | Aapo Hyvärinen*, ; Hiroshi Morioka, University of Helsinki 1439 | 1440 | Can Active Memory Replace Attention? 1441 | Łukasz Kaiser*, ; Samy Bengio, 1442 | 1443 | Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning 1444 | Taiji Suzuki*, ; Heishiro Kanagawa, ; Hayato Kobayashi, ; Nobuyuki Shimizu, ; Yukihiro Tagami, 1445 | 1446 | The Product Cut 1447 | Thomas Laurent*, Loyola Marymount University; James Von Brecht, CSULB; Xavier Bresson, ; Arthur Szlam, 1448 | 1449 | Learning Sparse Gaussian Graphical Models with Overlapping Blocks 1450 | Mohammad Javad Hosseini*, University of Washington; Su-In Lee, 1451 | 1452 | Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale 1453 | Firas Abuzaid*, MIT; Joseph Bradley, Databricks; Feynman Liang, Cambridge University Engineering Department; Andrew Feng, Yahoo!; Lee Yang, Yahoo!; Matei Zaharia, MIT; Ameet Talwalkar, 1454 | 1455 | Average-case hardness of RIP certification 1456 | Tengyao Wang, University of Cambridge; Quentin Berthet*, ; Yaniv Plan, University of British Columbia 1457 | 1458 | Forward models at Purkinje synapses facilitate cerebellar anticipatory control 1459 | Ivan Herreros-Alonso*, Universitat Pompeu Fabra; Xerxes Arsiwalla, ; Paul Verschure, 1460 | 1461 | Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering 1462 | Michaël Defferrard*, EPFL; Xavier Bresson, ; pierre Vandergheynst, EPFL 1463 | 1464 | Deep Unsupervised Exemplar Learning 1465 | MIGUEL BAUTISTA*, HEIDELBERG UNIVERSITY; Artsiom Sanakoyeu, Heidelberg University; Ekaterina Tikhoncheva, Heidelberg University; Björn Ommer, 1466 | 1467 | Large-Scale Price Optimization via Network Flow 1468 | Shinji Ito*, NEC Coorporation; Ryohei Fujimaki, 1469 | 1470 | Online Pricing with Strategic and Patient Buyers 1471 | Michal Feldman, TAU; Tomer Koren, Technion---Israel Inst. of Technology; Roi Livni*, Huji; Yishay Mansour, Microsoft; Aviv Zohar, huji 1472 | 1473 | Global Optimality of Local Search for Low Rank Matrix Recovery 1474 | Srinadh Bhojanapalli*, TTI Chicago; Behnam Neyshabur, TTI-Chicago; Nathan Srebro, 1475 | 1476 | Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences 1477 | Daniel Neil*, Institute of Neuroinformatics; Michael Pfeiffer, Institute of Neuroinformatics; Shih-Chii Liu, 1478 | 1479 | Improving PAC Exploration Using the Median of Means 1480 | Jason Pazis*, MIT; Ronald Parr, ; Jonathan How, MIT 1481 | 1482 | Infinite Hidden Semi-Markov Modulated Interaction Point Process 1483 | Matt Zhang*, Nicta; Peng Lin, Data61; Ting Guo, Data61; Yang Wang, Data61, CSIRO; Fang Chen, Data61, CSIRO 1484 | 1485 | Cooperative Inverse Reinforcement Learning 1486 | Dylan Hadfield-Menell*, UC Berkeley; Stuart Russell, UC Berkeley; Pieter Abbeel, ; Anca Dragan, 1487 | 1488 | Spatio-Temporal Hilbert Maps for Continuous Occupancy Representation in Dynamic Environments 1489 | Ransalu Senanayake*, The University of Sydney; Lionel Ott, The University of Sydney; Simon O'Callaghan, NICTA; Fabio Ramos, The University of Sydney 1490 | 1491 | Select-and-Sample for Spike-and-Slab Sparse Coding 1492 | Abdul-Saboor Sheikh, University of Oldenburg; Jörg Lücke*, 1493 | 1494 | Tractable Operations for Arithmetic Circuits of Probabilistic Models 1495 | Yujia Shen*, ; Arthur Choi, ; Adnan Darwiche, 1496 | 1497 | Greedy Feature Construction 1498 | Dino Oglic*, University of Bonn; Thomas Gaertner, The University of Nottingham 1499 | 1500 | Mistake Bounds for Binary Matrix Completion 1501 | Mark Herbster, ; Stephen Pasteris, UCL; Massimiliano Pontil*, 1502 | 1503 | Data driven estimation of Laplace-Beltrami operator 1504 | Frederic Chazal, INRIA; Ilaria Giulini, ; Bertrand Michel*, 1505 | 1506 | Tracking the Best Expert in Non-stationary Stochastic Environments 1507 | Chen-Yu Wei*, Academia Sinica; Yi-Te Hong, Academia Sinica; Chi-Jen Lu, Academia Sinica 1508 | 1509 | Learning to learn by gradient descent by gradient descent 1510 | Marcin Andrychowicz*, Google Deepmind; Misha Denil, ; Sergio Gomez, Google DeepMind; Matthew Hoffman, Google DeepMind; David Pfau, Google DeepMind; Tom Schaul, ; Nando Freitas, Google 1511 | 1512 | Kernel Observers: Systems-Theoretic Modeling and Inference of Spatiotemporally Evolving Processes 1513 | Harshal Maske, UIUC; Girish Chowdhary*, UIUC; Hassan Kingravi, Pindrop Security Services 1514 | 1515 | Quantum Perceptron Models 1516 | Ashish Kapoor*, ; Nathan Wiebe, Microsoft Research; Krysta M. Svore, 1517 | 1518 | Guided Policy Search as Approximate Mirror Descent 1519 | William Montgomery*, University of Washington; Sergey Levine, University of Washington 1520 | 1521 | The Power of Optimization from Samples 1522 | Eric Balkanski*, Harvard University; Aviad Rubinstein, UC Berkeley; Yaron Singer, 1523 | 1524 | Deep Exploration via Bootstrapped DQN 1525 | Ian Osband*, DeepMind; Charles Blundell, DeepMind; Alexander Pritzel, ; Benjamin Van Roy, 1526 | 1527 | A Multi-step Inertial Forward-Backward Splitting Method for Non-convex Optimization 1528 | Jingwei Liang*, GREYC, ENSICAEN; Jalal Fadili, ; Gabriel Peyré, 1529 | 1530 | Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages 1531 | Yin Cheng Ng*, University College London; Pawel Chilinski, University College London; Ricardo Silva, University College London 1532 | 1533 | Convolutional Neural Fabrics 1534 | Shreyas Saxena*, INRIA; Jakob Verbeek, 1535 | 1536 | A Neural Transducer 1537 | Navdeep Jaitly*, ; Quoc Le, ; Oriol Vinyals, ; Ilya Sutskever, ; David Sussillo, Google; Samy Bengio, 1538 | 1539 | Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy 1540 | Aryan Mokhtari*, University of Pennsylvania; Hadi Daneshmand, ETH Zurich; Aurelien Lucchi, ; Thomas Hofmann, ; Alejandro Ribeiro, University of Pennsylvania 1541 | 1542 | A Sparse Interactive Model for Inductive Matrix Completion 1543 | Jin Lu, University of Connecticut; Guannan Liang, University of Connecticut; jiangwen Sun, University of Connecticut; Jinbo Bi*, University of Connecticut 1544 | 1545 | Coresets for Scalable Bayesian Logistic Regression 1546 | Jonathan Huggins*, MIT; Trevor Campbell, MIT; Tamara Broderick, MIT 1547 | 1548 | Agnostic Estimation for Misspecified Phase Retrieval Models 1549 | Matey Neykov*, Princeton University; Zhaoran Wang, Princeton University; Han Liu, 1550 | 1551 | Linear Relaxations for Finding Diverse Elements in Metric Spaces 1552 | Aditya Bhaskara*, University of Utah; Mehrdad Ghadiri, Sharif University of Technolog; Vahab Mirrokni, Google; Ola Svensson, EPFL 1553 | 1554 | Binarized Neural Networks 1555 | Itay Hubara*, Technion; Matthieu Courbariaux, Université de Montréal; Daniel Soudry, Columbia University; Ran El-Yaniv, Technion; Yoshua Bengio, Université de Montréal 1556 | 1557 | On Local Maxima in the Population Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences 1558 | Chi Jin*, UC Berkeley; Yuchen Zhang, ; Sivaraman Balakrishnan, CMU; Martin Wainwright, UC Berkeley; Michael Jordan, 1559 | 1560 | Memory-Efficient Backpropagation Through Time 1561 | Audrunas Gruslys*, Google DeepMind; Remi Munos, Google DeepMind; Ivo Danihelka, ; Marc Lanctot, Google DeepMind; Alex Graves, 1562 | 1563 | Bayesian Optimization with Robust Bayesian Neural Networks 1564 | Jost Tobias Springenberg*, University of Freiburg; Aaron Klein, University of Freiburg; Stefan Falkner, University of Freiburg; Frank Hutter, University of Freiburg 1565 | 1566 | Learnable Visual Markers 1567 | Oleg Grinchuk, Skolkovo Institute of Science and Technology; Vadim Lebedev, Skolkovo Institute of Science and Technology; Victor Lempitsky*, 1568 | 1569 | Fast Algorithms for Robust PCA via Gradient Descent 1570 | Xinyang Yi*, UT Austin; Dohyung Park, University of Texas at Austin; Yudong Chen, ; Constantine Caramanis, 1571 | 1572 | One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities 1573 | Michalis K. Titsias*, 1574 | 1575 | Learning Deep Embeddings with Histogram Loss 1576 | Evgeniya Ustinova, Skoltech; Victor Lempitsky*, 1577 | 1578 | Spectral Learning of Dynamic Systems from Nonequilibrium Data 1579 | Hao Wu*, Free University of Berlin; Frank Noe, 1580 | 1581 | Markov Chain Sampling in Discrete Probabilistic Models with Constraints 1582 | Chengtao Li*, MIT; Suvrit Sra, MIT; Stefanie Jegelka, MIT 1583 | 1584 | Mapping Estimation for Discrete Optimal Transport 1585 | Michael Perrot*, University of Saint-Etienne, laboratoire Hubert Curien; Nicolas Courty, ; Rémi Flamary, ; Amaury Habrard, University of Saint-Etienne, Laboratoire Hubert Curien 1586 | 1587 | BBO-DPPs: Batched Bayesian Optimization via Determinantal Point Processes 1588 | Tarun Kathuria*, Microsoft Research; Amit Deshpande, ; Pushmeet Kohli, 1589 | 1590 | Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images 1591 | Vladimir Golkov*, Technical University of Munich; Marcin Skwark, Vanderbilt University; Antonij Golkov, University of Augsburg; Alexey Dosovitskiy, ; Thomas Brox, University of Freiburg; Jens Meiler, Vanderbilt University; Daniel Cremers, Technical University of Munich 1592 | 1593 | Linear Feature Encoding for Reinforcement Learning 1594 | Zhao Song*, Duke University; Ronald Parr, ; Xuejun Liao, Duke University; Lawrence Carin, 1595 | 1596 | A Minimax Approach to Supervised Learning 1597 | Farzan Farnia*, Stanford University; David Tse, Stanford University 1598 | 1599 | Edge-Exchangeable Graphs and Sparsity 1600 | Diana Cai*, University of Chicago; Trevor Campbell, MIT; Tamara Broderick, MIT 1601 | 1602 | A Locally Adaptive Normal Distribution 1603 | Georgios Arvanitidis*, DTU; Lars Kai Hansen, ; Søren Hauberg, 1604 | 1605 | Completely random measures for modelling block-structured sparse networks 1606 | Tue Herlau*, ; Mikkel Schmidt, DTU; Morten Mørup, Technical University of Denmark 1607 | 1608 | Sparse Support Recovery with Non-smooth Loss Functions 1609 | Kévin Degraux*, Université catholique de Louva; Gabriel Peyré, ; Jalal Fadili, ; Laurent Jacques, Université catholique de Louvain 1610 | 1611 | Neurons Equipped with Intrinsic Plasticity Learn Stimulus Intensity Statistics 1612 | Travis Monk*, University of Oldenburg; Cristina Savin, IST Austria; Jörg Lücke, 1613 | 1614 | Learning values across many orders of magnitude 1615 | Hado Van Hasselt*, ; Arthur Guez, ; Matteo Hessel, Google DeepMind; Volodymyr Mnih, ; David Silver, 1616 | 1617 | Adaptive Smoothed Online Multi-Task Learning 1618 | Keerthiram Murugesan*, Carnegie Mellon University; Hanxiao Liu, Carnegie Mellon University; Jaime Carbonell, CMU; Yiming Yang, CMU 1619 | 1620 | Safe Exploration in Finite Markov Decision Processes with Gaussian Processes 1621 | Matteo Turchetta, ETH Zurich; Felix Berkenkamp*, ETH Zurich; Andreas Krause, 1622 | 1623 | Probabilistic Linear Multistep Methods 1624 | Onur Teymur*, Imperial College London; Kostas Zygalakis, ; Ben Calderhead, 1625 | 1626 | Stochastic Three-Composite Convex Minimization 1627 | Alp Yurtsever*, EPFL; Bang Vu, ; Volkan Cevher, 1628 | 1629 | Using Fast Weights to Attend to the Recent Past 1630 | Jimmy Ba*, University of Toronto; Geoffrey Hinton, Google; Volodymyr Mnih, ; Joel Leibo, Google DeepMind; Catalin Ionescu, Google 1631 | 1632 | Maximal Sparsity with Deep Networks? 1633 | Bo Xin*, Peking University; Yizhou Wang, Peking University; Wen Gao, peking university; David Wipf, 1634 | 1635 | Quantifying and Reducing Stereotypes in Word Embeddings 1636 | Tolga Bolukbasi*, Boston University; Kai-Wei Chang, ; James Zou, ; Venkatesh Saligrama, ; Adam Kalai, Microsoft Research 1637 | 1638 | beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data 1639 | Valentina Zantedeschi*, UJM Saint-Etienne, France; Rémi Emonet, ; Marc Sebban, 1640 | 1641 | Learning Additive Exponential Family Graphical Models via ℓ2,1-norm Regularized M-Estimation 1642 | Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang, ; Qingshan Liu, ; Guangcan Liu, NUIST 1643 | 1644 | Backprop KF: Learning Discriminative Deterministic State Estimators 1645 | Tuomas Haarnoja*, UC Berkeley; Anurag Ajay, UC Berkeley; Sergey Levine, University of Washington; Pieter Abbeel, 1646 | 1647 | 2-Component Recurrent Neural Networks 1648 | Xiang Li*, NJUST; Tao Qin, Microsoft; Jian Yang, ; Xiaolin Hu, ; Tie-Yan Liu, Microsoft Research 1649 | 1650 | Fast recovery from a union of subspaces 1651 | Chinmay Hegde, ; Piotr Indyk, MIT; Ludwig Schmidt*, MIT 1652 | 1653 | Incremental Learning for Variational Sparse Gaussian Process Regression 1654 | Ching-An Cheng*, Georgia Institute of Technolog; Byron Boots, 1655 | 1656 | A Consistent Regularization Approach for Structured Prediction 1657 | Carlo Ciliberto*, MIT; Lorenzo Rosasco, ; Alessandro Rudi, 1658 | 1659 | Clustering Signed Networks with the Geometric Mean of Laplacians 1660 | Pedro Eduardo Mercado Lopez*, Saarland University; Francesco Tudisco, Saarland University; Matthias Hein, Saarland University 1661 | 1662 | An urn model for majority voting in classification ensembles 1663 | Víctor Soto, Columbia University; Alberto Suarez, ; Gonzalo Martínez-Muñoz*, 1664 | 1665 | Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction 1666 | Jacob Steinhardt*, Stanford University; Gregory Valiant, ; Moses Charikar, Stanford University 1667 | 1668 | Fast and accurate spike sorting of high-channel count probes with KiloSort 1669 | Marius Pachitariu*, ; Nick Steinmetz, UCL; Shabnam Kadir, ; Matteo Carandini, UCL; Kenneth Harris, UCL 1670 | 1671 | Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning 1672 | Wouter M. Koolen*, ; Peter Grunwald, CWI; Tim Van Erven, 1673 | 1674 | Ancestral Causal Inference 1675 | Sara Magliacane*, VU University Amsterdam; Tom Claassen, ; Joris Mooij, Radboud University Nijmegen 1676 | 1677 | More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning 1678 | Xinyang Yi, UT Austin; Zhaoran Wang, Princeton University; Zhuoran Yang , Princeton University; Constantine Caramanis, ; Han Liu*, 1679 | 1680 | Tagger: Deep Unsupervised Perceptual Grouping 1681 | Klaus Greff*, IDSIA; Antti Rasmus, The Curious AI Company; Mathias Berglund, The Curious AI Company; Tele Hao, The Curious AI Company; Harri Valpola, The Curious AI Company 1682 | 1683 | Efficient Algorithm for Streaming Submodular Cover 1684 | Ashkan Norouzi-Fard*, EPFL; Abbas Bazzi, EPFL; Ilija Bogunovic, EPFL Lausanne; Marwa El Halabi, l; Ya-Ping Hsieh, ; Volkan Cevher, 1685 | 1686 | Interaction Networks for Learning about Objects, Relations and Physics 1687 | Peter Battaglia*, Google DeepMind; Razvan Pascanu, ; Matthew Lai, Google DeepMind; Danilo Jimenez Rezende, ; Koray Kavukcuoglu, Google DeepMind 1688 | 1689 | Efficient state-space modularization for planning: theory, behavioral and neural signatures 1690 | Daniel McNamee*, University of Cambridge; Daniel Wolpert, University of Cambridge; Máté Lengyel, University of Cambridge 1691 | 1692 | Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent 1693 | Chi Jin*, UC Berkeley; Sham Kakade, ; Praneeth Netrapalli, Microsoft Research 1694 | 1695 | Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics 1696 | Wei-Shou Hsu*, University of Waterloo; Pascal Poupart, 1697 | 1698 | Computing and maximizing influence in linear threshold and triggering models 1699 | Justin Khim*, University of Pennsylvania; Varun Jog, ; Po-Ling Loh, Berkeley 1700 | 1701 | Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions 1702 | Yichen Wang*, Georgia Tech; Nan Du, ; Rakshit Trivedi, Georgia Institute of Technolo; Le Song, 1703 | 1704 | Learning Deep Parsimonious Representations 1705 | Renjie Liao*, UofT; Alexander Schwing, ; Rich Zemel, ; Raquel Urtasun, 1706 | 1707 | Optimal Learning for Multi-pass Stochastic Gradient Methods 1708 | Junhong Lin*, Istituto Italiano di Tecnologia; Lorenzo Rosasco, 1709 | 1710 | Generative Adversarial Imitation Learning 1711 | Jonathan Ho*, Stanford; Stefano Ermon, 1712 | 1713 | An End-to-End Approach for Natural Language to IFTTT Program Translation 1714 | Chang Liu*, University of Maryland; Xinyun Chen, Shanghai Jiaotong University; Richard Shin, ; Mingcheng Chen, University of Illinois, Urbana-Champaign; Dawn Song, UC Berkeley 1715 | 1716 | Dual Space Gradient Descent for Online Learning 1717 | Trung Le*, University of Pedagogy Ho Chi Minh city; Tu Nguyen, Deakin University; Vu Nguyen, Deakin University; Dinh Phung, Deakin University 1718 | 1719 | Fast stochastic optimization on Riemannian manifolds 1720 | Hongyi Zhang*, MIT; Sashank Jakkam Reddi, Carnegie Mellon University; Suvrit Sra, MIT 1721 | 1722 | Professor Forcing: A New Algorithm for Training Recurrent Networks 1723 | Alex Lamb, Montreal; Anirudh Goyal*, University of Montreal; ying Zhang, University of Montreal; Saizheng Zhang, University of Montreal; Aaron Courville, University of Montreal; Yoshua Bengio, U. Montreal 1724 | 1725 | Learning brain regions via large-scale online structured sparse dictionary learning 1726 | Elvis DOHMATOB*, Inria; Arthur Mensch, inria; Gaël Varoquaux, ; Bertrand Thirion, 1727 | 1728 | Efficient Neural Codes under Metabolic Constraints 1729 | Zhuo Wang*, University of Pennsylvania; Xue-Xin Wei, University of Pennsylvania; Alan Stocker, ; Dan Lee , University of Pennsylvania 1730 | 1731 | Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods 1732 | Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University 1733 | 1734 | Efficient High-Order Interaction-Aware Feature Selection Based on Conditional Mutual Information 1735 | Alexander Shishkin, Yandex; Anastasia Bezzubtseva, Yandex; Alexey Drutsa*, Yandex; Ilia Shishkov, Yandex; Ekaterina Gladkikh, Yandex; Gleb Gusev, Yandex LLC; Pavel Serdyukov, Yandex 1736 | 1737 | Bayesian Intermittent Demand Forecasting for Large Inventories 1738 | Matthias Seeger*, Amazon; David Salinas, Amazon; Valentin Flunkert, Amazon 1739 | 1740 | Visual Question Answering with Question Representation Update 1741 | RUIYU LI*, CUHK; Jiaya Jia, CUHK 1742 | 1743 | Learning Parametric Sparse Models for Image Super-Resolution 1744 | Yongbo Li, Xidian University; Weisheng Dong*, Xidian University; GUANGMING Shi, Xidian University; Xuemei Xie, Xidian University; Xin Li, WVU 1745 | 1746 | Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning 1747 | Jean-Bastien Grill, Inria Lille - Nord Europe; Michal Valko*, Inria Lille - Nord Europe; Remi Munos, Google DeepMind 1748 | 1749 | Asynchronous Parallel Greedy Coordinate Descent 1750 | Yang You, UC Berkeley; Xiangru Lian, University of Rochester; Cho-Jui Hsieh*, ; Ji Liu, ; Hsiang-Fu Yu, University of Texas at Austin; Inderjit Dhillon, ; James Demmel, UC Berkeley 1751 | 1752 | Iterative Refinement of the Approximate Posterior for Directed Belief Networks 1753 | Rex Devon Hjelm*, University of New Mexico; Ruslan Salakhutdinov, University of Toronto; Kyunghyun Cho, University of Montreal; Nebojsa Jojic, Microsoft Research; Vince Calhoun, Mind Research Network; Junyoung Chung, University of Montreal 1754 | 1755 | Assortment Optimization Under the Mallows model 1756 | Antoine Desir*, Columbia University; Vineet Goyal, ; Srikanth Jagabathula, ; Danny Segev, 1757 | 1758 | Disease Trajectory Maps 1759 | Peter Schulam*, Johns Hopkins University; Raman Arora, 1760 | 1761 | Multistage Campaigning in Social Networks 1762 | Mehrdad Farajtabar*, Georgia Tech; Xiaojing Ye, Georgia State University; Sahar Harati, Emory University; Le Song, ; Hongyuan Zha, Georgia Institute of Technology 1763 | 1764 | Learning in Games: Robustness of Fast Convergence 1765 | Dylan Foster, Cornell University; Zhiyuan Li, Tsinghua University; Thodoris Lykouris*, Cornell University; Karthik Sridharan, Cornell University; Eva Tardos, Cornell University 1766 | 1767 | Improving Variational Autoencoders with Inverse Autoregressive Flow 1768 | Diederik Kingma*, ; Tim Salimans, 1769 | 1770 | Algorithms and matching lower bounds for approximately-convex optimization 1771 | Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University 1772 | 1773 | Unified Methods for Exploiting Piecewise Structure in Convex Optimization 1774 | Tyler Johnson*, University of Washington; Carlos Guestrin, 1775 | 1776 | Kernel Bayesian Inference with Posterior Regularization 1777 | Yang Song*, Stanford University; Jun Zhu, ; Yong Ren, Tsinghua University 1778 | 1779 | Neural universal discrete denoiser 1780 | Taesup Moon*, DGIST; Seonwoo Min, ; Byunghan Lee, ; Sungroh Yoon, 1781 | 1782 | Optimal Architectures in a Solvable Model of Deep Networks 1783 | Jonathan Kadmon*, Hebrew University; Haim Sompolinsky , 1784 | 1785 | Conditional Image Generation with Pixel CNN Decoders 1786 | Aaron Van den Oord*, Google Deepmind; Nal Kalchbrenner, ; Lasse Espeholt, ; Koray Kavukcuoglu, Google DeepMind; Oriol Vinyals, ; Alex Graves, 1787 | 1788 | Supervised Learning with Tensor Networks 1789 | Edwin Stoudenmire*, Univ of California Irvine; David Schwab, Northwestern University 1790 | 1791 | Multi-step learning and underlying structure in statistical models 1792 | Maia Fraser*, University of Ottawa 1793 | 1794 | Blind Optimal Recovery of Signals 1795 | Dmitry Ostrovsky*, Univ. Grenoble Alpes; Zaid Harchaoui, NYU, Courant Institute; Anatoli Juditsky, ; Arkadi Nemirovski, Gerogia Institute of Technology 1796 | 1797 | An Architecture for Deep, Hierarchical Generative Models 1798 | Philip Bachman*, 1799 | 1800 | Feature selection for classification of functional data using recursive maxima hunting 1801 | José Torrecilla*, Universidad Autónoma de Madrid; Alberto Suarez, 1802 | 1803 | Achieving budget-optimality with adaptive schemes in crowdsourcing 1804 | Ashish Khetan, University of Illinois Urbana-; Sewoong Oh*, 1805 | 1806 | Near-Optimal Smoothing of Structured Conditional Probability Matrices 1807 | Moein Falahatgar, UCSD; Mesrob I. Ohannessian*, ; Alon Orlitsky, 1808 | 1809 | Supervised Word Mover's Distance 1810 | Gao Huang, ; Chuan Guo*, Cornell University; Matt Kusner, ; Yu Sun, ; Fei Sha, University of Southern California; Kilian Weinberger, 1811 | 1812 | Exploiting Tradeoffs for Exact Recovery in Heterogeneous Stochastic Block Models 1813 | Amin Jalali*, University of Washington; Qiyang Han, University of Washington; Ioana Dumitriu, University of Washington; Maryam Fazel, University of Washington 1814 | 1815 | Full-Capacity Unitary Recurrent Neural Networks 1816 | Scott Wisdom*, University of Washington; Thomas Powers, ; John Hershey, ; Jonathan LeRoux, ; Les Atlas, 1817 | 1818 | Threshold Bandits, With and Without Censored Feedback 1819 | Jacob Abernethy, ; Kareem Amin, ; Ruihao Zhu*, Massachusetts Institute of Technology 1820 | 1821 | Understanding the Effective Receptive Field in Deep Convolutional Neural Networks 1822 | Wenjie Luo*, University of Toronto; Yujia Li, University of Toronto; Raquel Urtasun, ; Rich Zemel, 1823 | 1824 | Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods 1825 | Lev Bogolubsky, ; Pavel Dvurechensky*, Weierstrass Institute for Appl; Alexander Gasnikov, ; Gleb Gusev, Yandex LLC; Yurii Nesterov, ; Andrey Raigorodskii, ; Aleksey Tikhonov, ; Maksim Zhukovskii, 1826 | 1827 | k^*-Nearest Neighbors: From Global to Local 1828 | Oren Anava, Technion; Kfir Levy*, Technion 1829 | 1830 | Normalized Spectral Map Synchronization 1831 | Yanyao Shen*, UT Austin; Qixing Huang, Toyota Technological Institute at Chicago; Nathan Srebro, ; Sujay Sanghavi, 1832 | 1833 | Beyond Exchangeability: The Chinese Voting Process 1834 | Moontae Lee*, Cornell University; Seok Hyun Jin, Cornell University; David Mimno, Cornell University 1835 | 1836 | A posteriori error bounds for joint matrix decomposition problems 1837 | Nicolo Colombo, Univ of Luxembourg; Nikos Vlassis*, Adobe Research 1838 | 1839 | A Bayesian method for reducing bias in neural representational similarity analysis 1840 | Ming Bo Cai*, Princeton University; Nicolas Schuck, Princeton Neuroscience Institute, Princeton University; Jonathan Pillow, ; Yael Niv, 1841 | 1842 | Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes 1843 | Chris Junchi Li, Princeton University; Zhaoran Wang*, Princeton University; Han Liu, 1844 | 1845 | Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities 1846 | Ruitong Huang*, University of Alberta; Tor Lattimore, ; András György, ; Csaba Szepesvari, U. Alberta 1847 | 1848 | SDP Relaxation with Randomized Rounding for Energy Disaggregation 1849 | Kiarash Shaloudegi, ; András György*, ; Csaba Szepesvari, U. Alberta; Wilsun Xu, University of Alberta 1850 | 1851 | Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates 1852 | Yuanzhi Li, Princeton University; Yingyu Liang*, ; Andrej Risteski, Princeton University 1853 | 1854 | Unsupervised Learning of 3D Structure from Images 1855 | Danilo Jimenez Rezende*, ; S. M. Ali Eslami, Google DeepMind; Shakir Mohamed, Google DeepMind; Peter Battaglia, Google DeepMind; Max Jaderberg, ; Nicolas Heess, 1856 | 1857 | Poisson-Gamma dynamical systems 1858 | Aaron Schein*, UMass Amherst; Hanna Wallach, Microsoft Research New England; Mingyuan Zhou, 1859 | 1860 | Gaussian Processes for Survival Analysis 1861 | Tamara Fernandez, Oxford; Nicolas Rivera*, King's College London; Yee-Whye Teh, 1862 | 1863 | Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain 1864 | Ian En-Hsu Yen*, University of Texas at Austin; huang Xiangru, University of Texas at Austin; Kai Zhong, University of Texas at Austin; Zhang Ruohan, University of Texas at Austin; Pradeep Ravikumar, ; Inderjit Dhillon, 1865 | 1866 | Optimal Binary Classifier Aggregation for General Losses 1867 | Akshay Balsubramani*, UC San Diego; Yoav Freund, 1868 | 1869 | Disentangling factors of variation in deep representation using adversarial training 1870 | Michael Mathieu, NYU; Junbo Zhao, NYU; Aditya Ramesh, NYU; Pablo Sprechmann*, ; Yann LeCun, NYU 1871 | 1872 | A primal-dual method for constrained consensus optimization 1873 | Necdet Aybat*, Penn State University; Erfan Yazdandoost Hamedani, Penn State University 1874 | 1875 | Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing 1876 | Farshad Lahouti *, Caltech ; Babak Hassibi, Caltech 1877 | 1878 | --------------------------------------------------------------------------------