└── README.md


/README.md:
--------------------------------------------------------------------------------
   1 | # NIPS2016
   2 | This project collects the different accepted papers for NIPS 2016 and their link to Arxiv or Gitxiv
   3 | 
   4 | * **Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much**
   5 | 
   6 | Bryan He*, Stanford University; Christopher De Sa, Stanford University; Ioannis Mitliagkas, ; Christopher Ré, Stanford University
   7 | 
   8 | https://arxiv.org/abs/1606.03432
   9 | 
  10 | Abstract:
  11 | >Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional >distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality >in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been >conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by >counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor >under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan >using conductance.
  12 | 
  13 | * **Deep ADMM-Net for Compressive Sensing MRI**
  14 | 
  15 | Yan Yang, Xi'an Jiaotong University; Jian Sun*, Xi'an Jiaotong University; Huibin Li, ; Zongben Xu, 
  16 | 
  17 | * **A scaled Bregman theorem with applications**
  18 | 
  19 | Richard NOCK, Data61 and ANU; Aditya Menon*, ; Cheng Soon Ong, Data61
  20 | 
  21 | http://arxiv.org/abs/1607.00360
  22 | 
  23 | Abstract :
  24 | >Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores >the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled >isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman >distortions'" (employing a potentially non-convex generator) may be exactly re-written as a scaled Bregman divergence computed over >transformed data. Admissible distortions include geodesic distances on curved manifolds and projections or gauge-normalisation, >while admissible data include scalars, vectors and matrices. 
  25 | >Our theorem allows one to leverage to the wealth and convenience of Bregman divergences when analysing algorithms relying on the >aforementioned Bregman distortions. We illustrate this with three novel applications of our theorem: a reduction from multi-class >density ratio to class-probability estimation, a new adaptive projection free yet norm-enforcing dual norm mirror descent algorithm, >and a reduction from clustering on flat manifolds to clustering on curved manifolds. Experiments on each of these domains validate >the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence >properties that have been pervasive in machine learning.
  26 | >
  27 | 
  28 | * **Swapout: Learning an ensemble of deep architectures**
  29 | 
  30 | Saurabh Singh*, UIUC; Derek Hoiem, UIUC; David Forsyth, UIUC
  31 | 
  32 | http://arxiv.org/abs/1605.06465
  33 | 
  34 | Abstract: 
  35 | >We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive >results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual >architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, >similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying >the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than >existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting >architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient >training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer >wider model performs similar to a 1001 layer ResNet model.
  36 | 
  37 |  
  38 | * **On Regularizing Rademacher Observation Losses**
  39 | 
  40 | Richard NOCK*, Data61 and ANU
  41 |  
  42 | http://users.cecs.anu.edu.au/~rnock/nips2016-n-web.pdf
  43 | 
  44 | Abstract:
  45 | >It has recently been shown that supervised learning linear classifiers with two of
  46 | >the most popular losses, the logistic and square loss, is equivalent to optimizing an
  47 | >equivalent loss over sufficient statistics about the class: Rademacher observations
  48 | >(rados). It has also been shown that learning over rados brings solutions to two
  49 | >prominent problems for which the state of the art of learning from examples can be
  50 | >comparatively inferior and in fact less convenient: (i) protecting and learning from
  51 | >private examples, (ii) learning from distributed datasets without entity resolution.
  52 | >Bis repetita placent: the two proofs of equivalence are different and rely on specific
  53 | >properties of the corresponding losses, so whether these can be unified and generalized
  54 | >inevitably comes to mind. This is our first contribution: we show how they can
  55 | >be fit into the same theory for the equivalence between example and rado losses.
  56 | >As a second contribution, we show that the generalization unveils a surprising
  57 | >new connection to regularized learning, and in particular a sufficient condition
  58 | >under which regularizing the loss over examples is equivalent to regularizing the
  59 | >rados (i.e. the data) in the equivalent rado loss, in such a way that an efficient
  60 | >algorithm for one regularized rado loss may be as efficient when changing the
  61 | >regularizer. This is our third contribution: we give a formal boosting algorithm
  62 | >for the regularized exponential rado-loss which boost with any of the ridge, lasso,
  63 | >SLOPE, `1, or elastic net regularizer, using the same master routine for all. Because
  64 | >the regularized exponential rado-loss is the equivalent of the regularized logistic
  65 | >loss over examples we obtain the first efficient proxy to the minimization of the
  66 | >regularized logistic loss over examples using such a wide spectrum of regularizers.
  67 | >Experiments display that regularization significantly improves rado-based learning
  68 | >and compares favourably with example-based learning.
  69 | 
  70 | * **Without-Replacement Sampling for Stochastic Gradient Methods**
  71 | 
  72 | Ohad Shamir*, Weizmann Institute of Science
  73 | 
  74 | https://arxiv.org/abs/1603.00570
  75 | 
  76 | Abstract:
  77 | >Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled >with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often >performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various >scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A >useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in >terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be >as large as the data size (up to logarithmic factors). Our proof techniques combine ideas from stochastic optimization, adversarial >online learning, and transductive learning theory, and can potentially be applied to other stochastic optimization and learning >problems.
  78 | 
  79 | * **Fast and Provably Good Seedings for k-Means**
  80 | 
  81 | Olivier Bachem*, ETH Zurich; Mario Lucic, ETH Zurich; Hamed Hassani, ETH Zurich; Andreas Krause, 
  82 | 
  83 | Abstract:
  84 | >Seeding - the task of finding initial cluster centers - is critical in obtaining high-quality clusterings for k-Means. However, >k-means++ seeding, the state of the art algorithm, does not scale well to massive datasets as it is inherently sequential and >requires k full passes through the data. It was recently shown that Markov chain Monte Carlo sampling can be used to efficiently >approximate the seeding step of k-means++. However, this result requires assumptions on the data generating distribution. We propose >a simple yet fast seeding algorithm that produces *provably* good clusterings even *without assumptions* on the data. Our analysis >shows that the algorithm allows for a favourable trade-off between solution quality and computational cost, speeding up k-means++ >seeding by up to several orders of magnitude. We validate our theoretical results in extensive experiments on a variety of >real-world data sets.
  85 | 
  86 | * **Unsupervised Learning for Physical Interaction through Video Prediction**
  87 | 
  88 | Chelsea Finn*, Google, Inc.; Ian Goodfellow, ; Sergey Levine, University of Washington
  89 |  
  90 | http://arxiv.org/abs/1605.07157
  91 | 
  92 | Abstract:
  93 | >A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. >Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale >real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To >learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models >pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it >is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for >real-world interactive agents, we also introduce a dataset of 50,000 robot interactions involving pushing motions, including a test >set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning >a "visual imagination" of different futures based on different courses of action. Our experiments show that our proposed method not >only produces more accurate video predictions, but also more accurately predicts object motion, when compared to prior methods.
  94 | 
  95 | * **Matrix Completion and Clustering in Self-Expressive Models**
  96 | 
  97 | Ehsan Elhamifar*, 
  98 |  
  99 | * **Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling**
 100 | 
 101 | Chengkai Zhang, ; Jiajun Wu*, MIT; Tianfan Xue, ; William Freeman, ; Joshua Tenenbaum, 
 102 |  
 103 | * **Probabilistic Modeling of Future Frames from a Single Image**
 104 | 
 105 | Tianfan Xue*, ; Jiajun Wu, MIT; Katherine Bouman, MIT; William Freeman, 
 106 |  
 107 | * **Human Decision-Making under Limited Time**
 108 |  
 109 | Pedro Ortega*, ; Alan Stocker, 
 110 |  
 111 | * **Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition**
 112 | 
 113 | Shizhong Han*, University of South Carolina; Zibo Meng, University of South Carolina; Ahmed Shehab Khan, University of South Carolina; Yan Tong, University of South Carolina
 114 |  
 115 | https://cse.sc.edu/~mengz/papers/NIPS2016.pdf
 116 | 
 117 | Abstract:
 118 | >Recognizing facial action units (AUs) from spontaneous facial expressions is still
 119 | >a challenging problem. Most recently, CNNs have shown promise on facial AU
 120 | >recognition. However, the learned CNNs are often overfitted and do not gener-
 121 | >alize well to unseen subject due to limited AU-coded training images. We pro-
 122 | >posed a novel Incremental Boosting CNN (IB-CNN) to integrate boosting into the
 123 | >CNN via an incremental boosting layer that selects discriminative neurons from
 124 | >the lower layer and is incrementally updated on successive mini-batches. In ad-
 125 | >dition, a novel loss function that accounts for errors from both the incremental
 126 | >boosted classifier and individual weak classifiers was proposed to fine-tune the IB-
 127 | >CNN. Experimental results on two benchmark AU databases have demonstrated
 128 | >that the IB-CNN yields significant improvement over the traditional CNN and the
 129 | >one without incremental learning, as well as outperforming the state-of-the-art
 130 | >CNN-based methods in AU recognition. The improvement is more impressive for
 131 | >the AUs that have the lowest frequencies in the databases
 132 | 
 133 | * **Natural-Parameter Networks: A Class of Probabilistic Neural Networks**
 134 | 
 135 | Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung, 
 136 |  
 137 | * **Tree-Structured Reinforcement Learning for Sequential Object Localization**
 138 | 
 139 | Zequn Jie*, National Univ of Singapore; Xiaodan Liang, Sun Yat-sen University; Jiashi Feng, National University of Singapo; Xiaojie Jin, NUS; Wen Feng Lu, National Univ of Singapore; Shuicheng Yan, 
 140 |  
 141 | * **Unsupervised Domain Adaptation with Residual Transfer Networks**
 142 | 
 143 | Mingsheng Long*, Tsinghua University; Han Zhu, Tsinghua University; Jianmin Wang, Tsinghua University; Michael Jordan, 
 144 | 
 145 | http://arxiv.org/abs/1602.04433
 146 | 
 147 | Abstract:
 148 | >The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is >unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to >domain adaptation in deep networks that can simultaneously learn adaptive classifiers and transferable features from labeled data in >the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and >assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging >several layers into the deep network to explicitly learn the residual function with reference to the target classifier. We embed >features of multiple layers into reproducing kernel Hilbert spaces (RKHSs) and match feature distributions for feature adaptation. >The adaptation behaviors can be achieved in most feed-forward models by extending them with new residual layers and loss functions, >which can be trained efficiently using standard back-propagation. Empirical evidence exhibits that the approach outperforms state of >art methods on standard domain adaptation datasets.
 149 | 
 150 | * **Verification Based Solution for Structured MAB Problems**
 151 | 
 152 | Zohar Karnin*, 
 153 | 
 154 | * **Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games**
 155 |  
 156 | Maximilian Balandat*, UC Berkeley; Walid Krichene, UC Berkeley; Claire Tomlin, UC Berkeley; Alexandre Bayen, UC Berkeley
 157 |  
 158 | * **Linear dynamical neural population models through nonlinear embeddings**
 159 | 
 160 | Yuanjun Gao, Columbia University; Evan Archer*, ; John Cunningham, ; Liam Paninski, 
 161 | 
 162 | https://arxiv.org/abs/1605.08454
 163 | 
 164 | Abstract:
 165 | >A body of recent work in modeling neural activity focuses on recovering low-dimensional latent features that capture the statistical >structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is >computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of >each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to >capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent >space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior >capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of >generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, >fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive >performance and interpretability.
 166 | 
 167 | * **SURGE: Surface Regularized Geometry Estimation from a Single Image**
 168 | 
 169 | Peng Wang*, UCLA; Xiaohui Shen, Adobe Research; Bryan Russell, ; Scott Cohen, Adobe Research; Brian Price, ; Alan Yuille, 
 170 | 
 171 | * **Interpretable Distribution Features with Maximum Testing Power**
 172 | 
 173 | Wittawat Jitkrittum*, Gatsby Unit, UCL; Zoltan Szabo, ; Kacper Chwialkowski, Gatsby Unit, UCL; Arthur Gretton, 
 174 | 
 175 | https://arxiv.org/abs/1605.06796
 176 | 
 177 | Abstract:
 178 | >Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions >evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the >distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious >and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion >converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional >text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art >quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results.
 179 | 
 180 | * **Sorting out typicality with the inverse moment matrix SOS polynomial**
 181 | 
 182 | Edouard Pauwels*, ; Jean-Bernard Lasserre, LAAS-CNRS
 183 | 
 184 | http://arxiv.org/abs/1606.03858
 185 | 
 186 | Abstract:
 187 | >We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the >previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain >distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) >derived in a simple manner from the inverse of the empirical moment matrix. In fact, this SOS polynomial is directly related to >orthogonal polynomials and the Christoffel function. This allows to generalize and interpret extremality properties of orthogonal >polynomials and to provide a mathematical rationale for the observed phenomenon. Among diverse potential applications, we illustrate >the relevance of our results on a network intrusion detection task for which we obtain performances similar to existing dedicated >methods reported in the literature.
 188 | 
 189 | * **Multi-armed Bandits: Competing with Optimal Sequences**
 190 | 
 191 | Zohar Karnin*, ; Oren Anava, Technion
 192 | 
 193 | * **Multivariate tests of association based on univariate tests**
 194 | 
 195 | Ruth Heller*, Tel-Aviv University; Yair Heller, 
 196 | 
 197 | http://arxiv.org/abs/1603.03418
 198 | 
 199 | Abstract:
 200 | >For testing two random vectors for independence, we consider testing whether the distance of one vector from a center point is >independent from the distance of the other vector from a center point by a univariate test. In this paper we provide conditions >under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect >dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If >the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center >points and aggregate the center-specific univariate tests, the power may be further improved, and the resulting multivariate test >may be distribution-free for specific aggregation methods (if the univariate test is distribution-free). We show that several >multivariate tests recently proposed in the literature can be viewed as instances of this general approach.
 201 | 
 202 | * **Learning What and Where to Draw**
 203 | 
 204 | Scott Reed*, University of Michigan; Zeynep Akata, Max Planck Institute for Informatics; Santosh Mohan, University of MIchigan; Samuel Tenka, University of MIchigan; Bernt Schiele, ; Honglak Lee, University of Michigan
 205 | 
 206 | * **The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM**
 207 | 
 208 | Damek Davis*, Cornell University; Brent Edmunds, University of California, Los Angeles; Madeleine Udell, 
 209 | 
 210 | https://arxiv.org/abs/1606.02338
 211 | 
 212 | Abstract:
 213 | >We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic >proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel >optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best >known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the >number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and >show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix >factorization problems.
 214 | 
 215 | * **Integrator Nets**
 216 | 
 217 | Hakan Bilen*, University of Oxford; Andrea Vedaldi, 
 218 | 
 219 | * **Combining Low-Density Separators with CNNs**
 220 | 
 221 | Yu-Xiong Wang*, Carnegie Mellon University; Martial Hebert, Carnegie Mellon University
 222 | 
 223 | * **CNNpack: Packing Convolutional Neural Networks in the Frequency Domain**
 224 | 
 225 | Yunhe Wang*, Peking University ; Shan You, ; Dacheng Tao, ; Chao Xu, ; Chang Xu, 
 226 | 
 227 | * **Cooperative Graphical Models**
 228 | 
 229 | Josip Djolonga*, ETH Zurich; Stefanie Jegelka, MIT; Sebastian Tschiatschek, ETH Zurich; Andreas Krause, 
 230 | 
 231 | * **f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization**
 232 | 
 233 | Sebastian Nowozin*, Microsoft Research; Botond Cseke, Microsoft Research; Ryota Tomioka, MSRC
 234 | 
 235 | https://arxiv.org/abs/1606.00709
 236 | 
 237 | Abstract:
 238 | >Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random >input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and >allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The >generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We >show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation >approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various >choices of divergence functions on training complexity and the quality of the obtained generative models.
 239 | 
 240 | * **Bayesian Optimization for Probabilistic Programs**
 241 | 
 242 | Tom Rainforth*, University of Oxford; Tuan Anh Le, University of Oxford; Jan-Willem van de Meent, University of Oxford; Michael Osborne, ; Frank Wood, 
 243 | 
 244 | * **Hierarchical Question-Image Co-Attention for Visual Question Answering**
 245 | 
 246 | Jiasen Lu*, Virginia Tech; Jianwei Yang, Virginia Tech; Dhruv Batra, ; Devi Parikh, Virginia Tech
 247 | 
 248 | https://arxiv.org/abs/1606.00061
 249 | 
 250 | Abstract:
 251 | >A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting >image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual >attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model >for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question and consequently >the image via the co-attention mechanism in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN) >model. Our final model outperforms all reported methods, improving the state-of-the-art on the VQA dataset from 60.4% to 62.1%, and >from 61.6% to 65.4% on the COCO-QA dataset.
 252 | 
 253 | * **Optimal Sparse Linear Encoders and Sparse PCA**
 254 | 
 255 | Malik Magdon-Ismail*, Rensselaer; Christos Boutsidis, 
 256 | 
 257 | * **FPNN: Field Probing Neural Networks for 3D Data**
 258 | 
 259 | Yangyan Li*, Stanford University; Soeren Pirk, Stanford University; Hao Su, Stanford University; Charles Qi, Stanford University; Leonidas Guibas, Stanford University
 260 | 
 261 | https://arxiv.org/abs/1605.06240
 262 | 
 263 | Abstract:
 264 | >Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. >Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting >convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of >3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, >occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, >we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract >features from them. Each field probing filter is a set of probing points --- sensors that perceive the space. Our learning algorithm >optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing >filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space "intelligently", rather than >operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing >state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.
 265 | 
 266 | * **CRF-CNN: Modeling Structured Information in Human Pose Estimation**
 267 | 
 268 | Xiao Chu*, Cuhk; Wanli Ouyang, ; hongsheng Li, cuhk; Xiaogang Wang, Chinese University of Hong Kong
 269 | 
 270 | * **Fairness in Learning: Classic and Contextual Bandits**
 271 | 
 272 | Matthew Joseph, University of Pennsylvania; Michael Kearns, ; Jamie Morgenstern*, University of Pennsylvania; Aaron Roth, 
 273 | 
 274 | https://arxiv.org/abs/1605.07139
 275 | 
 276 | Abstract:
 277 | >We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that >given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite >a learning algorithm's uncertainty over the true payoffs. We prove results of two types. 
 278 | >First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a >provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on >the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for >standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the >general contextual case. 
 279 | >In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a >KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair >contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably >fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different >class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms
 280 | 
 281 | Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization
 282 | Alexander Kirillov*, TU Dresden; Alexander Shekhovtsov, ; Carsten Rother, ; Bogdan Savchynskyy, 
 283 |  
 284 | Domain Separation Networks
 285 | Dilip Krishnan, Google; George Trigeorgis, Google; Konstantinos Bousmalis*, ; Nathan Silberman, Google; Dumitru Erhan, Google
 286 |  
 287 | DISCO Nets : DISsimilarity COefficients Networks
 288 | Diane Bouchacourt*, University of Oxford; M. Pawan Kumar, University of Oxford; Sebastian Nowozin, 
 289 |  
 290 | Multimodal Residual Learning for Visual QA
 291 | Jin-Hwa Kim*, Seoul National University; Sang-Woo Lee, Seoul National University; Dong-Hyun Kwak, Seoul National University; Min-Oh Heo, Seoul National University; Jeonghee Kim, Naver Labs; Jung-Woo Ha, Naver Labs; Byoung-Tak Zhang, Seoul National University
 292 |  
 293 | CMA-ES with Optimal Covariance Update and Storage Complexity
 294 | Dídac Rodríguez Arbonès, University of Copenhagen; Oswin Krause, ; Christian Igel*, 
 295 |  
 296 | R-FCN: Object Detection via Region-based Fully Convolutional Networks
 297 | Jifeng Dai, Microsoft; Yi Li, Tsinghua University; Kaiming He*, Microsoft; Jian Sun, Microsoft
 298 |  
 299 | GAP Safe Screening Rules for Sparse-Group Lasso
 300 | Eugene Ndiaye, Télécom ParisTech; Olivier Fercoq, ; Alexandre Gramfort, ; Joseph Salmon*, 
 301 |  
 302 | Learning and Forecasting Opinion Dynamics in Social Networks
 303 | Abir De, IIT Kharagpur; Isabel Valera, ; Niloy Ganguly, IIT Kharagpur; sourangshu Bhattacharya, IIT Kharagpur; Manuel Gomez Rodriguez*, MPI-SWS
 304 |  
 305 | Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares
 306 | Rong Zhu*, Chinese Academy of Sciences
 307 |  
 308 | Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks
 309 | Hao Wang*, HKUST; Xingjian Shi, ; Dit-Yan Yeung, 
 310 |  
 311 | Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
 312 | Jean Barbier, EPFL; mohamad Dia, EPFL; Florent Krzakala*, ; Thibault Lesieur, IPHT Saclay; Nicolas Macris, EPFL; Lenka Zdeborova, 
 313 |  
 314 | A Unified Approach for Learning the Parameters of Sum-Product Networks
 315 | Han Zhao*, Carnegie Mellon University; Pascal Poupart, ; Geoff Gordon, 
 316 |  
 317 | Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images
 318 | Junhua Mao*, UCLA; Jiajing Xu, ; Kevin Jing, ; Alan Yuille, 
 319 |  
 320 | Stochastic Online AUC Maximization
 321 | Yiming Ying*, ; Longyin Wen, State University of New York at Albany; Siwei Lyu, State University of New York at Albany
 322 |  
 323 | The Generalized Reparameterization Gradient
 324 | Francisco Ruiz*, Columbia University; Michalis K. Titsias, ; David Blei, 
 325 |  
 326 | Coupled Generative Adversarial Networks
 327 | Ming-Yu Liu*, MERL; Oncel Tuzel, Mitsubishi Electric Research Labs (MERL)
 328 |  
 329 | Exponential Family Embeddings
 330 | Maja Rudolph*, Columbia University; Francisco J. R. Ruiz, ; Stephan Mandt, Disney Research; David Blei, 
 331 |  
 332 | Variational Information Maximization for Feature Selection
 333 | Shuyang Gao*, ; Greg Ver Steeg, ; Aram Galstyan, 
 334 |  
 335 | Operator Variational Inference
 336 | Rajesh Ranganath*, Princeton University; Dustin Tran, Columbia University; Jaan Altosaar, Princeton University; David Blei, 
 337 |  
 338 | Fast learning rates with heavy-tailed losses
 339 | Vu Dinh*, Fred Hutchinson Cancer Center; Lam Ho, UCLA; Binh Nguyen, University of Science, Vietnam; Duy Nguyen, University of Wisconsin-Madison
 340 |  
 341 | Budgeted stream-based active learning via adaptive submodular maximization
 342 | Kaito Fujii*, Kyoto University; Hisashi Kashima, Kyoto University
 343 |  
 344 | Learning feed-forward one-shot learners
 345 | Luca Bertinetto, University of Oxford; Joao Henriques, University of Oxford; Jack Valmadre*, University of Oxford; Philip Torr, ; Andrea Vedaldi, 
 346 |  
 347 | Learning User Perceived Clusters with Feature-Level Supervision
 348 | Ting-Yu Cheng, ; Kuan-Hua Lin, ; Xinyang Gong, Baidu Inc.; Kang-Jun Liu, ; Shan-Hung Wu*, National Tsing Hua University
 349 |  
 350 | Robust Spectral Detection of Global Structures in the Data by Learning a Regularization
 351 | Pan Zhang*, ITP, CAS
 352 |  
 353 | Residual Networks are Exponential Ensembles of Relatively Shallow Networks
 354 | Andreas Veit*, Cornell University; Michael Wilber, ; Serge Belongie, Cornell University
 355 |  
 356 | Adversarial Multiclass Classification: A Risk Minimization Perspective
 357 | Rizal Fathony*, U. of Illinois at Chicago; Anqi Liu, ; Kaiser Asif, ; Brian Ziebart, 
 358 |  
 359 | Solving Random Systems of Quadratic Equations via Truncated Generalized Gradient Flow
 360 | Gang Wang*, University of Minnesota; Georgios Giannakis, University of Minnesota
 361 |  
 362 | Coin Betting and Parameter-Free Online Learning
 363 | Francesco Orabona*, Yahoo Research; David Pal, 
 364 |  
 365 | Deep Learning without Poor Local Minima
 366 | Kenji Kawaguchi*, MIT
 367 |  
 368 | Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity
 369 | Eugene Belilovsky*, CentraleSupelec; Gael Varoquaux, ; Matthew Blaschko, KU Leuven
 370 |  
 371 | A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++
 372 | Dennis Wei*, IBM Research
 373 |  
 374 | Generating Videos with Scene Dynamics
 375 | Carl Vondrick*, MIT; Hamed Pirsiavash, ; Antonio Torralba, 
 376 |  
 377 | Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs
 378 | Daniel Ritchie*, Stanford University; Anna Thomas, Stanford University; Pat Hanrahan, Stanford University; Noah Goodman, 
 379 |  
 380 | A Powerful Generative Model Using Random Weights for the Deep Image Representation
 381 | Kun He, Huazhong University of Science and Technology; Yan Wang*, HUAZHONG UNIVERSITY OF SCIENCE; John Hopcroft, Cornell University
 382 |  
 383 | Optimizing affinity-based binary hashing using auxiliary coordinates
 384 | Ramin Raziperchikolaei, UC Merced; Miguel Carreira-Perpinan*, UC Merced
 385 |  
 386 | Double Thompson Sampling for Dueling Bandits
 387 | Huasen Wu*, University of California at Davis; Xin Liu, University of California, Davis
 388 |  
 389 | Generating Images with Perceptual Similarity Metrics based on Deep Networks
 390 | Alexey Dosovitskiy*, ; Thomas Brox, University of Freiburg
 391 |  
 392 | Dynamic Filter Networks
 393 | Xu Jia*, KU Leuven; Bert De Brabandere, ; Tinne Tuytelaars, KU Leuven; Luc Van Gool, ETH Zürich
 394 |  
 395 | A Simple Practical Accelerated Method for Finite Sums
 396 | Aaron Defazio*, Ambiata
 397 |  
 398 | Barzilai-Borwein Step Size for Stochastic Gradient Descent
 399 | Conghui Tan*, The Chinese University of HK; Shiqian Ma, ; Yu-Hong Dai, ; Yuqiu Qian, The University of Hong Kong
 400 |  
 401 | On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability
 402 | Guillaume Papa, Télécom ParisTech; Aurélien Bellet*, ; Stephan Clémencon, 
 403 |  
 404 | Optimal spectral transportation with application to music transcription
 405 | Rémi Flamary, ; Cédric Févotte*, CNRS; Nicolas Courty, ; Valentin Emiya, Aix-Marseille University
 406 |  
 407 | Regularized Nonlinear Acceleration
 408 | Damien Scieur*, INRIA - ENS; Alexandre D'Aspremont, ; Francis Bach, 
 409 |  
 410 | SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling
 411 | Dehua Cheng*, Univ. of Southern California; Richard Peng, ; Yan Liu, ; Ioakeim Perros, Georgia Institute of Technology 
 412 |  
 413 | Single-Image Depth Perception in the Wild
 414 | Weifeng Chen*, University of Michigan; Zhao Fu, University of Michigan; Dawei Yang, University of Michigan; Jia Deng, 
 415 |  
 416 | Computational and Statistical Tradeoffs in Learning to Rank
 417 | Ashish Khetan*, University of Illinois Urbana-; Sewoong Oh, 
 418 |  
 419 | Learning to Poke by Poking: Experiential Learning of Intuitive Physics
 420 | Pulkit Agrawal*, UC Berkeley; Ashvin Nair, UC Berkeley; Pieter Abbeel, ; Jitendra Malik, ; Sergey Levine, University of Washington
 421 |  
 422 | Online Convex Optimization with Unconstrained Domains and Losses
 423 | Ashok Cutkosky*, Stanford University; Kwabena Boahen, Stanford University
 424 |  
 425 | An ensemble diversity approach to supervised binary hashing
 426 | Miguel Carreira-Perpinan*, UC Merced; Ramin Raziperchikolaei, UC Merced
 427 |  
 428 | Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis
 429 | Weiran Wang*, ; Jialei Wang, University of Chicago; Dan Garber, ; Nathan Srebro, 
 430 |  
 431 | The Power of Adaptivity in Identifying Statistical Alternatives
 432 | Kevin Jamieson*, UC Berkeley; Daniel Haas, ; Ben Recht, 
 433 |  
 434 | On Explore-Then-Commit strategies
 435 | Aurelien Garivier, ; Tor Lattimore, ; Emilie Kaufmann*, 
 436 |  
 437 | Sublinear Time Orthogonal Tensor Decomposition
 438 | Zhao Song*, UT-Austin; David Woodruff, ; Huan Zhang, UC-Davis
 439 |  
 440 | DECOrrelated feature space partitioning for distributed sparse regression
 441 | Xiangyu Wang*, Duke University; David Dunson, Duke University; Chenlei Leng, University of Warwick
 442 |  
 443 | Deep Alternative Neural Networks: Exploring Contexts as Early as Possible for Action Recognition
 444 | Jinzhuo Wang*, PKU; Wenmin Wang, peking university; xiongtao Chen, peking university; Ronggang Wang, peking university; Wen Gao, peking university
 445 |  
 446 | Machine Translation Through Learning From a Communication Game
 447 | Di He*, Microsoft; Yingce Xia, USTC; Tao Qin, Microsoft; Liwei Wang, ; Nenghai Yu, USTC; Tie-Yan Liu, Microsoft; wei-Ying Ma, Microsoft
 448 |  
 449 | Dialog-based Language Learning
 450 | Jason Weston*, 
 451 |  
 452 | Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
 453 | Theodore Bluche*, A2iA
 454 |  
 455 | Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction
 456 | Hsiang-Fu Yu*, University of Texas at Austin; Nikhil Rao, ; Inderjit Dhillon, 
 457 |  
 458 | Active Nearest-Neighbor Learning in Metric Spaces
 459 | Aryeh Kontorovich, ; Sivan Sabato*, Ben-Gurion University of the Negev; Ruth Urner, MPI Tuebingen
 460 |  
 461 | Proximal Deep Structured Models
 462 | Shenlong Wang*, University of Toronto; Sanja Fidler, ; Raquel Urtasun, 
 463 |  
 464 | Faster Projection-free Convex Optimization over the Spectrahedron
 465 | Dan Garber*, 
 466 |  
 467 | Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach
 468 | Remi Lam*, MIT; Karen Willcox, MIT; David Wolpert, 
 469 |  
 470 | Learning Sound Representations from Unlabeled Video
 471 | Yusuf Aytar, MIT; Carl Vondrick*, MIT; Antonio Torralba, 
 472 |  
 473 | Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
 474 | Tim Salimans*, ; Diederik Kingma, 
 475 |  
 476 | Efficient Second Order Online Learning by Sketching
 477 | Haipeng Luo*, Princeton University; Alekh Agarwal, Microsoft; Nicolò Cesa-Bianchi, ; John Langford, 
 478 |  
 479 | Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis
 480 | Yoshinobu Kawahara*, Osaka University
 481 |  
 482 | Distributed Flexible Nonlinear Tensor Factorization
 483 | Shandian Zhe*, Purdue University; Kai Zhang, Lawrence Berkeley Lab; Pengyuan Wang, Yahoo! Research; Kuang-chih Lee, ; Zenglin Xu, ; Alan Qi, ; Zoubin Ghahramani, 
 484 |  
 485 | The Robustness of Estimator Composition
 486 | Pingfan Tang*, University of Utah; Jeff Phillips, University of Utah
 487 |  
 488 | Efficient and Robust Spiking Neural Circuit for Navigation Inspired by Echolocating Bats
 489 | Bipin Rajendran*, NJIT; Pulkit Tandon, IIT Bombay; Yash Malviya, IIT Bombay
 490 |  
 491 | PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
 492 | Michael Figurnov*, Skolkovo Inst. of Sc and Tech; Aijan Ibraimova, Skolkovo Institute of Science and Technology; Dmitry P. Vetrov, ; Pushmeet Kohli, 
 493 |  
 494 | Differential Privacy without Sensitivity
 495 | Kentaro Minami*, The University of Tokyo; HItomi Arai, The University of Tokyo; Issei Sato, The University of Tokyo; Hiroshi Nakagawa, 
 496 |  
 497 | Optimal Cluster Recovery in the Labeled Stochastic Block Model
 498 | Se-Young Yun*, Los Alamos National Laboratory; Alexandre Proutiere, 
 499 |  
 500 | Even Faster SVD Decomposition Yet Without Agonizing Pain
 501 | Zeyuan Allen-Zhu*, Princeton University; Yuanzhi Li, Princeton University
 502 |  
 503 | An algorithm for L1 nearest neighbor search via monotonic embedding
 504 | Xinan Wang*, UCSD; Sanjoy Dasgupta, 
 505 |  
 506 | Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations
 507 | Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Junier Oliva, ; Jeff Schneider, CMU; Barnabas Poczos, 
 508 |  
 509 | Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes
 510 | Dan Garber*, ; Ofer Meshi, 
 511 |  
 512 | Efficient Nonparametric Smoothness Estimation
 513 | Shashank Singh*, Carnegie Mellon University; Simon Du, Carnegie Mellon University; Barnabas Poczos, 
 514 |  
 515 | A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
 516 | Yarin Gal*, University of Cambridge; Zoubin Ghahramani, 
 517 |  
 518 | Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation
 519 | George Papamakarios*, University of Edinburgh; Iain Murray, University of Edinburgh
 520 |  
 521 | Direct Feedback Alignment Provides Learning In Deep Neural Networks
 522 | Arild Nøkland*, None
 523 |  
 524 | Safe and Efficient Off-Policy Reinforcement Learning
 525 | Remi Munos, Google DeepMind; Thomas Stepleton, Google DeepMind; Anna Harutyunyan, Vrije Universiteit Brussel; Marc Bellemare*, Google DeepMind
 526 |  
 527 | A Multi-Batch L-BFGS Method for Machine Learning
 528 | Albert Berahas*, Northwestern University; Jorge Nocedal, Northwestern University; Martin Takac, Lehigh University
 529 |  
 530 | Semiparametric Differential Graph Models
 531 | Pan Xu*, University of Virginia; Quanquan Gu, University of Virginia
 532 |  
 533 | Rényi Divergence Variational Inference
 534 | Yingzhen Li*, University of Cambridge; Richard E. Turner, 
 535 |  
 536 | Doubly Convolutional Neural Networks
 537 | Shuangfei Zhai*, Binghamton University; Yu Cheng, IBM Research; Zhongfei Zhang, Binghamton University
 538 |  
 539 | Density Estimation via Discrepancy Based Adaptive Sequential Partition
 540 | Dangna Li*, Stanford university; Kun Yang, Google Inc; Wing Wong, Stanford university
 541 |  
 542 | How Deep is the Feature Analysis underlying Rapid Visual Categorization?
 543 | Sven Eberhardt*, Brown University; Jonah Cader, Brown University; Thomas Serre, 
 544 |  
 545 | Variational Information Maximizing Exploration
 546 | Rein Houthooft*, Ghent University - iMinds; UC Berkeley; OpenAI; Xi Chen, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; John Schulman, OpenAI; Filip De Turck, Ghent University - iMinds; Pieter Abbeel, 
 547 |  
 548 | Generalized Correspondence-LDA Models (GC-LDA) for Identifying Functional Regions in the Brain
 549 | Timothy Rubin*, Indiana University; Sanmi Koyejo, UIUC; Michael Jones, Indiana University; Tal Yarkoni, University of Texas at Austin
 550 |  
 551 | Solving Marginal MAP Problems with NP Oracles and Parity Constraints
 552 | Yexiang Xue*, Cornell University; Zhiyuan Li, Tsinghua University; Stefano Ermon, ; Carla Gomes, Cornell University; Bart Selman, 
 553 |  
 554 | Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models
 555 | Tomoharu Iwata*, ; Makoto Yamada, 
 556 |  
 557 | Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
 558 | Sashank Jakkam Reddi*, Carnegie Mellon University; Suvrit Sra, MIT; Barnabas Poczos, ; Alexander J. Smola, 
 559 |  
 560 | Variance Reduction in Stochastic Gradient Langevin Dynamics
 561 | Kumar Dubey*, Carnegie Mellon University; Sashank Jakkam Reddi, Carnegie Mellon University; Sinead Williamson, ; Barnabas Poczos, ; Alexander J. Smola, ; Eric Xing, Carnegie Mellon University
 562 |  
 563 | Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
 564 | Mehdi Sajjadi*, University of Utah; Mehran Javanmardi, University of Utah; Tolga Tasdizen, University of Utah
 565 |  
 566 | Dense Associative Memory for Pattern Recognition
 567 | Dmitry Krotov*, Institute for Advanced Study; John Hopfield, Princeton Neuroscience Institute
 568 |  
 569 | Causal Bandits: Learning Good Interventions via Causal Inference
 570 | Finnian Lattimore, Australian National University; Tor Lattimore*, ; Mark Reid, 
 571 |  
 572 | Refined Lower Bounds for Adversarial Bandits
 573 | Sébastien Gerchinovitz, ; Tor Lattimore*, 
 574 |  
 575 | Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
 576 | Gang Niu*, University of Tokyo; Marthinus du Plessis, ; Tomoya Sakai, ; Yao Ma, ; Masashi Sugiyama, RIKEN / University of Tokyo
 577 |  
 578 | Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ϵ)
 579 | Yi Xu*, The University of Iowa; Yan Yan, University of Technology Sydney; Qihang Lin, ; Tianbao Yang, University of Iowa
 580 |  
 581 | Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functionals Estimators
 582 | Shashank Singh*, Carnegie Mellon University; Barnabas Poczos, 
 583 |  
 584 | A state-space model of cross-region dynamic connectivity in MEG/EEG
 585 | Ying Yang*, Carnegie Mellon University; Elissa Aminoff, Carnegie Mellon University; Michael Tarr, Carnegie Mellon University; Robert Kass, Carnegie Mellon University
 586 |  
 587 | What Makes Objects Similar: A Unified Multi-Metric Learning Approach
 588 | Han-Jia Ye, ; De-Chuan Zhan*, ; Xue-Min Si, Nanjing University; Yuan Jiang, Nanjing University; Zhi-Hua Zhou, 
 589 |  
 590 | Adaptive Maximization of Pointwise Submodular Functions With Budget Constraint
 591 | Nguyen Viet Cuong*, National University of Singapore; Huan Xu, NUS
 592 |  
 593 | Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
 594 | Siddartha Ramamohan, Indian Institute of Science; Arun Rajkumar, ; Shivani Agarwal*, Radcliffe Institute, Harvard
 595 |  
 596 | Local Similarity-Aware Deep Feature Embedding
 597 | Chen Huang*, Chinese University of HongKong; Chen Change Loy, The Chinese University of HK; Xiaoou Tang, The Chinese University of Hong Kong
 598 |  
 599 | A Communication-Efficient Parallel Algorithm for Decision Tree
 600 | Qi Meng*, Peking University; Guolin Ke, Microsoft Research; Taifeng Wang, Microsoft Research; Wei Chen, Microsoft Research; Qiwei Ye, Microsoft Research; Zhi-Ming Ma, Academy of Mathematics and Systems Science, Chinese Academy of Sciences; Tie-Yan Liu, Microsoft Research
 601 |  
 602 | Convex Two-Layer Modeling with Latent Structure
 603 | Vignesh Ganapathiraman, University Of Illinois at Chicago; Xinhua Zhang*, UIC; Yaoliang Yu, ; Junfeng Wen, UofA
 604 |  
 605 | Sampling for Bayesian Program Learning
 606 | Kevin Ellis*, MIT; Armando Solar-Lezama, MIT; Joshua Tenenbaum, 
 607 |  
 608 | Learning Kernels with Random Features
 609 | Aman Sinha*, Stanford University; John Duchi, 
 610 |  
 611 | Optimal Tagging with Markov Chain Optimization
 612 | Nir Rosenfeld*, Hebrew University of Jerusalem; Amir Globerson, Tel Aviv University
 613 |  
 614 | Crowdsourced Clustering: Querying Edges vs Triangles
 615 | Ramya Korlakai Vinayak*, Caltech; Hassibi Babak, Caltech
 616 |  
 617 | Mixed vine copulas as joint models of spike counts and local field potentials
 618 | Arno Onken*, IIT; Stefano Panzeri, IIT
 619 |  
 620 | Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation
 621 | Emmanuel Abbe*, ; Colin Sandon, 
 622 |  
 623 | Adaptive Concentration Inequalities for Sequential Decision Problems
 624 | Shengjia Zhao*, Tsinghua University; Enze Zhou, Tsinghua University; Ashish Sabharwal, Allen Institute for AI; Stefano Ermon,
 625 | Fast mini-batch k-means by nesting
 626 | James Newling*, Idiap Research Institute; Francois Fleuret, Idiap Research Institute
 627 |  
 628 | Deep Learning Models of the Retinal Response to Natural Scenes
 629 | Lane McIntosh*, Stanford University; Niru Maheswaranathan, Stanford University; Aran Nayebi, Stanford University; Surya Ganguli, Stanford; Stephen Baccus, Stanford University
 630 |  
 631 | Preference Completion from Partial Rankings
 632 | Suriya Gunasekar*, UT Austin; Sanmi Koyejo, UIUC; Joydeep Ghosh, UT Austin
 633 |  
 634 | Dynamic Network Surgery for Efficient DNNs
 635 | Yiwen Guo*, Intel Labs China; Anbang Yao, ; Yurong Chen, 
 636 |  
 637 | Learning a Metric Embedding for Face Recognition using the Multibatch Method
 638 | Oren Tadmor, OrCam; Tal Rosenwein, Orcam; Shai Shalev-Shwartz, OrCam; Yonatan Wexler*, OrCam; Amnon Shashua, OrCam
 639 |  
 640 | A Pseudo-Bayesian Algorithm for Robust PCA
 641 | Tae-Hyun Oh*, KAIST; David Wipf, ; Yasuyuki Matsushita, Osaka University; In So Kweon, KAIST
 642 |  
 643 | End-to-End Kernel Learning with Supervised Convolutional Kernel Networks
 644 | Julien Mairal*, Inria
 645 |  
 646 | Stochastic Variance Reduction Methods for Saddle-Point Problems
 647 | P. Balamurugan, ; Francis Bach*, 
 648 |  
 649 | Flexible Models for Microclustering with Applications to Entity Resolution
 650 | Brenda Betancourt, Duke University; Giacomo Zanella, The University of Warick; Jeffrey Miller, Duke University; Hanna Wallach, Microsoft Research New England; Abbas Zaidi, Duke University; Rebecca C. Steorts*, Duke University
 651 |  
 652 | Catching heuristics are optimal control policies
 653 | Boris Belousov*, TU Darmstadt; Gerhard Neumann, ; Constantin Rothkopf, ; Jan Peters, 
 654 |  
 655 | Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian
 656 | Victor Picheny, Institut National de la Recherche Agronomique; Robert Gramacy*, ; Stefan Wild, Argonne National Lab; Sebastien Le Digabel, École Polytechnique de Montréal
 657 |  
 658 | Adaptive Neural Compilation
 659 | Rudy Bunel*, Oxford University; Alban Desmaison, Oxford; M. Pawan Kumar, University of Oxford; Pushmeet Kohli, ; Philip Torr,
 660 |  
 661 | Synthesis of MCMC and Belief Propagation
 662 | Sung-Soo Ahn*, KAIST; Misha Chertkov, Los Alamos National Laboratory; Jinwoo Shin, KAIST
 663 |  
 664 | Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables
 665 | Mauro Scanagatta*, Idsia; Giorgio Corani, Idsia; Cassio Polpo de Campos, Queen's University Belfast; Marco Zaffalon, IDSIA
 666 |  
 667 | Unifying Count-Based Exploration and Intrinsic Motivation
 668 | Marc Bellemare*, Google DeepMind; Srinivasan Sriram, ; Georg Ostrovski, Google DeepMind; Tom Schaul, ; David Saxton, Google DeepMind; Remi Munos, Google DeepMind
 669 |  
 670 | Large Margin Discriminant Dimensionality Reduction in Prediction Space
 671 | Mohammad Saberian*, Netflix; Jose Costa Pereira, UC San Diego; Nuno Nvasconcelos, UC San Diego
 672 |  
 673 | Stochastic Structured Prediction under Bandit Feedback
 674 | Artem Sokolov, Heidelberg University; Julia Kreutzer, Heidelberg University; Stefan Riezler*, Heidelberg University
 675 |  
 676 | Simple and Efficient Weighted Minwise Hashing
 677 | Anshumali Shrivastava*, Rice University
 678 |  
 679 | Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
 680 | Ilija Bogunovic*, EPFL Lausanne; Jonathan Scarlett, ; Andreas Krause, ; Volkan Cevher, 
 681 |  
 682 | Structured Sparse Regression via Greedy Hard Thresholding
 683 | Prateek Jain, Microsoft Research; Nikhil Rao*, ; Inderjit Dhillon, 
 684 |  
 685 | Understanding Probabilistic Sparse Gaussian Process Approximations
 686 | Matthias Bauer*, University of Cambridge; Mark van der Wilk, University of Cambridge; Carl Rasmussen, University of Cambridge
 687 |  
 688 | SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques
 689 | Elad Richardson*, Technion; Rom Herskovitz, ; Boris Ginsburg, ; Michael Zibulevsky, 
 690 |  
 691 | Long-Term Trajectory Planning Using Hierarchical Memory Networks
 692 | Stephan Zheng*, Caltech; Yisong Yue, ; Patrick Lucey, Stats
 693 |  
 694 | Learning Tree Structured Potential Games
 695 | Vikas Garg*, MIT; Tommi Jaakkola, 
 696 |  
 697 | Observational-Interventional Priors for Dose-Response Learning
 698 | Ricardo Silva*, 
 699 |  
 700 | Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs
 701 | Shahin Jabbari*, University of Pennsylvania; Ryan Rogers, University of Pennsylvania; Aaron Roth, ; Steven Wu, University of Pennsylvania
 702 |  
 703 | Identification and Overidentification of Linear Structural Equation Models
 704 | Bryant Chen*, UCLA
 705 |  
 706 | Adaptive Skills Adaptive Partitions (ASAP)
 707 | Daniel Mankowitz*, Technion; Timothy Mann, Google DeepMind; Shie Mannor, Technion
 708 |  
 709 | Multiple-Play Bandits in the Position-Based Model
 710 | Paul Lagrée*, Université Paris Sud; Claire Vernade, Université Paris Saclay; Olivier Cappe, 
 711 |  
 712 | Optimal Black-Box Reductions Between Optimization Objectives
 713 | Zeyuan Allen-Zhu*, Princeton University; Elad Hazan, 
 714 |  
 715 | On Valid Optimal Assignment Kernels and Applications to Graph Classification
 716 | Nils Kriege*, TU Dortmund; Pierre-Louis Giscard, University of York; Richard Wilson, University of York
 717 |  
 718 | Robustness of classifiers: from adversarial to random noise
 719 | Alhussein Fawzi, ; Seyed-Mohsen Moosavi-Dezfooli*, EPFL; Pascal Frossard, EPFL
 720 |  
 721 | A Non-convex One-Pass Framework for Factorization Machines and Rank-One Matrix Sensing
 722 | Ming Lin*, ; Jieping Ye, 
 723 |  
 724 | Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
 725 | Zeyuan Allen-Zhu*, Princeton University; Yang Yuan, Cornell University; Karthik Sridharan, University of Pennsylvania
 726 |  
 727 | Combinatorial Multi-Armed Bandit with General Reward Functions
 728 | Wei Chen*, ; Wei Hu, Princeton University; Fu Li, The University of Texas at Austin; Jian Li, Tsinghua University; Yu Liu, Tsinghua University; Pinyan Lu, Shanghai University of Finance and Economics
 729 |  
 730 | Boosting with Abstention
 731 | Corinna Cortes, ; Giulia DeSalvo*, ; Mehryar Mohri, 
 732 |  
 733 | Regret of Queueing Bandits
 734 | Subhashini Krishnasamy, The University of Texas at Austin; Rajat Sen, The University of Texas at Austin; Ramesh Johari, ; Sanjay Shakkottai*, The University of Texas at Aus
 735 |  
 736 | Deep Learning Games
 737 | Dale Schuurmans*, ; Martin Zinkevich, Google
 738 |  
 739 | Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods
 740 | Antoine Gautier*, Saarland University; Quynh Nguyen, Saarland University; Matthias Hein, Saarland University
 741 |  
 742 | Learning Volumetric 3D Object Reconstruction from Single-View with Projective Transformations
 743 | Xinchen Yan*, University of Michigan; Jimei Yang, ; Ersin Yumer, Adobe Research; Yijie Guo, University of Michigan; Honglak Lee, University of Michigan
 744 |  
 745 | A Credit Assignment Compiler for Joint Prediction
 746 | Kai-Wei Chang*, ; He He, University of Maryland; Stephane Ross, Google; Hal III, ; John Langford, 
 747 |  
 748 | Accelerating Stochastic Composition Optimization
 749 | Mengdi Wang*, ; Ji Liu, 
 750 |  
 751 | Reward Augmented Maximum Likelihood for Neural Structured Prediction
 752 | Mohammad Norouzi*, ; Dale Schuurmans, ; Samy Bengio, ; zhifeng Chen, ; Navdeep Jaitly, ; Mike Schuster, ; Yonghui Wu, 
 753 |  
 754 | Consistent Kernel Mean Estimation for Functions of Random Variables
 755 | Adam Scibior*, University of Cambridge; Carl-Johann Simon-Gabriel, MPI Tuebingen; Iliya Tolstikhin, ; Bernhard Schoelkopf, 
 756 |  
 757 | Towards Unifying Hamiltonian Monte Carlo and Slice Sampling
 758 | Yizhe Zhang*, Duke university; Xiangyu Wang, Duke University; Changyou Chen, ; Ricardo Henao, ; Kai Fan, Duke university; Lawrence Carin, 
 759 |  
 760 | Scalable Adaptive Stochastic Optimization Using Random Projections
 761 | Gabriel Krummenacher*, ETH Zurich; Brian Mcwilliams, Disney Research; Yannic Kilcher, ETH Zurich; Joachim Buhmann, ETH Zurich; Nicolai Meinshausen, 
 762 |  
 763 | Variational Inference in Mixed Probabilistic Submodular Models
 764 | Josip Djolonga, ETH Zurich; Sebastian Tschiatschek*, ETH Zurich; Andreas Krause, 
 765 |  
 766 | Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated
 767 | Namrata Vaswani*, ; Han Guo, Iowa State University
 768 |  
 769 | The Multi-fidelity Multi-armed Bandit
 770 | Kirthevasan Kandasamy*, CMU; Gautam Dasarathy, Carnegie Mellon University; Barnabas Poczos, ; Jeff Schneider, CMU
 771 |  
 772 | Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm
 773 | Kejun Huang*, University of Minnesota; Xiao Fu, University of Minnesota; Nicholas Sidiropoulos, University of Minnesota
 774 |  
 775 | Bootstrap Model Aggregation for Distributed Statistical Learning
 776 | JUN HAN, Dartmouth College; Qiang Liu*, 
 777 |  
 778 | A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification
 779 | Steven Cheng-Xian Li*, UMass Amherst; Benjamin Marlin, 
 780 |  
 781 | A Bandit Framework for Strategic Regression
 782 | Yang Liu*, Harvard University; Yiling Chen, 
 783 |  
 784 | Architectural Complexity Measures of Recurrent Neural Networks
 785 | Saizheng Zhang*, University of Montreal; Yuhuai Wu, University of Toronto; Tong Che, IHES; Zhouhan Lin, University of Montreal; Roland Memisevic, University of Montreal; Ruslan Salakhutdinov, University of Toronto; Yoshua Bengio, U. Montreal
 786 |  
 787 | Statistical Inference for Cluster Trees
 788 | Jisu Kim*, Carnegie Mellon University; Yen-Chi Chen, Carnegie Mellon University; Sivaraman Balakrishnan, Carnegie Mellon University; Alessandro Rinaldo, Carnegie Mellon University; Larry Wasserman, Carnegie Mellon University
 789 |  
 790 | Contextual-MDPs for PAC Reinforcement Learning with Rich Observations
 791 | Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; John Langford, 
 792 |  
 793 | Improved Deep Metric Learning with Multi-class N-pair Loss Objective
 794 | Kihyuk Sohn*, 
 795 |  
 796 | Only H is left: Near-tight Episodic PAC RL
 797 | Christoph Dann*, Carnegie Mellon University; Emma Brunskill, Carnegie Mellon University
 798 |  
 799 | Stacked Approximated Regression Machine: A Simple Deep Learning Approach
 800 | Zhangyang Wang*, UIUC; Shiyu Chang, UIUC; Qing Ling, USTC; Shuai Huang, UW; Xia Hu, ; Honghui Shi, UIUC; Thomas Huang, UIUC
 801 |  
 802 | Unsupervised Learning of Spoken Language with Visual Context
 803 | David Harwath*, MIT CSAIL; Antonio Torralba, MIT CSAIL; James Glass, MIT CSAIL
 804 |  
 805 | Low-Rank Regression with Tensor Responses
 806 | Guillaume Rabusseau*, Aix-Marseille University; Hachem Kadri, 
 807 |  
 808 | PAC-Bayesian Theory Meets Bayesian Inference
 809 | Pascal Germain*, ; Francis Bach, ; Alexandre Lacoste, ; Simon Lacoste-Julien, INRIA
 810 |  
 811 | Data Poisoning Attacks on Factorization-Based Collaborative Filtering
 812 | Bo Li*, Vanderbilt University; Yining Wang, Carnegie Mellon University; Aarti Singh, Carnegie Mellon University; yevgeniy Vorobeychik, Vanderbilt University
 813 |  
 814 | Learned Region Sparsity and Diversity Also Predicts Visual Attention
 815 | Zijun Wei*, Stony Brook; Hossein Adeli, ; Minh Hoai, ; Gregory Zelinsky, ; Dimitris Samaras, 
 816 |  
 817 | End-to-End Goal-Driven Web Navigation
 818 | Rodrigo Frassetto Nogueira*, New York University; Kyunghyun Cho, University of Montreal
 819 |  
 820 | Automated scalable segmentation of neurons from multispectral images
 821 | Uygar Sümbül*, Columbia University; Douglas Roossien, University of Michigan, Ann Arbor; Dawen Cai, University of Michigan, Ann Arbor; John Cunningham, Columbia University; Liam Paninski, 
 822 |  
 823 | Privacy Odometers and Filters: Pay-as-you-Go Composition
 824 | Ryan Rogers*, University of Pennsylvania; Salil Vadhan, Harvard University; Aaron Roth, ; Jonathan Robert Ullman, 
 825 |  
 826 | Minimax Estimation of Maximal Mean Discrepancy with Radial Kernels
 827 | Iliya Tolstikhin*, ; Bharath Sriperumbudur, ; Bernhard Schoelkopf, 
 828 |  
 829 | Adaptive optimal training of animal behavior
 830 | Ji Hyun Bak*, Princeton University; Jung Yoon Choi, ; Ilana Witten, ; Jonathan Pillow, 
 831 |  
 832 | Hierarchical Object Representation for Open-Ended Object Category Learning and Recognition
 833 | Hamidreza Kasaei*, IEETA, University of Aveiro
 834 |  
 835 | Relevant sparse codes with variational information bottleneck
 836 | Matthew Chalk*, IST Austria; Olivier Marre, Institut de la vision; Gašper Tkačik, Institute of Science and Technology Austria
 837 |  
 838 | Combinatorial Energy Learning for Image Segmentation
 839 | Jeremy Maitin-Shepard*, Google; Viren Jain, Google; Michal Januszewski, Google; Peter Li, ; Pieter Abbeel, 
 840 |  
 841 | Orthogonal Random Features
 842 | Felix Xinnan Yu*, ; Ananda Theertha Suresh, ; Krzysztof Choromanski, ; Dan Holtmann-Rice, ; Sanjiv Kumar, Google
 843 |  
 844 | Fast Active Set Methods for Online Spike Inference from Calcium Imaging
 845 | Johannes Friedrich*, Columbia University; Liam Paninski, 
 846 |  
 847 | Diffusion-Convolutional Neural Networks
 848 | James Atwood*, UMass Amherst
 849 |  
 850 | Bayesian latent structure discovery from multi-neuron recordings
 851 | Scott Linderman*, ; Ryan Adams, ; Jonathan Pillow, 
 852 |  
 853 | A Probabilistic Programming Approach To Probabilistic Data Analysis
 854 | Feras Saad*, MIT; Vikash Mansinghka, MIT
 855 |  
 856 | A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics
 857 | William Hoiles*, University of California, Los ; Mihaela Van Der Schaar, 
 858 |  
 859 | Inference by Reparameterization in Neural Population Codes
 860 | RAJKUMAR VASUDEVA RAJU, Rice University; Xaq Pitkow*, 
 861 |  
 862 | Tensor Switching Networks
 863 | Chuan-Yung Tsai*, ; Andrew Saxe, ; David Cox, 
 864 |  
 865 | Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo
 866 | Alain Durmus, Telecom ParisTech; Umut Simsekli*, ; Eric Moulines, Ecole Polytechnique; Roland Badeau, Telecom ParisTech; Gaël Richard, Telecom ParisTech
 867 |  
 868 | Coordinate-wise Power Method
 869 | Qi Lei*, UT AUSTIN; Kai Zhong, UT AUSTIN; Inderjit Dhillon, 
 870 |  
 871 | Learning Influence Functions from Incomplete Observations
 872 | Xinran He*, USC; Ke Xu, USC; David Kempe, USC; Yan Liu, 
 873 |  
 874 | Learning Structured Sparsity in Deep Neural Networks
 875 | Wei Wen*, University of Pittsburgh; Chunpeng Wu, University of Pittsburgh; Yandan Wang, University of Pittsburgh; Yiran Chen, University of Pittsburgh; Hai Li, University of Pittsburg
 876 |  
 877 | Sample Complexity of Automated Mechanism Design
 878 | Nina Balcan, ; Tuomas Sandholm, Carnegie Mellon University; Ellen Vitercik*, Carnegie Mellon University
 879 |  
 880 | Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
 881 | SANGHAMITRA DUTTA*, Carnegie Mellon University; Viveck Cadambe, Pennsylvania State University; Pulkit Grover, Carnegie Mellon University
 882 |  
 883 | Brains on Beats
 884 | Umut Güçlü*, Radboud University; Jordy Thielen, Radboud University; Michael Hanke, Otto-von-Guericke University Magdeburg; Marcel Van Gerven, Radboud University
 885 |  
 886 | Learning Transferrable Representations for Unsupervised Domain Adaptation
 887 | Ozan Sener*, Cornell University; Hyun Oh Song, Google Research; Ashutosh Saxena, Brain of Things; Silvio Savarese, Stanford University
 888 |  
 889 | Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
 890 | Stefan Lee*, Indiana University; Senthil Purushwalkam, Carnegie Mellon; Michael Cogswell, Virginia Tech; Viresh Ranjan, Virginia Tech; David Crandall, Indiana University; Dhruv Batra, 
 891 |  
 892 | Active Learning from Imperfect Labelers
 893 | Songbai Yan*, University of California, San Diego; Kamalika Chaudhuri, University of California, San Diego; Tara Javidi, University of California, San Diego
 894 |  
 895 | Learning to Communicate with Deep Multi-Agent Reinforcement Learning
 896 | Jakob Foerster*, University of Oxford; Yannis Assael, University of Oxford; Nando de Freitas, University of Oxford; Shimon Whiteson, 
 897 |  
 898 | Value Iteration Networks
 899 | Aviv Tamar*, ; Sergey Levine, ; Pieter Abbeel, ; Yi Wu, UC Berkeley; Garrett Thomas, UC Berkeley
 900 |  
 901 | Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering
 902 | Dogyoon Song*, MIT; Christina Lee, MIT; Yihua Li, MIT; Devavrat Shah, 
 903 |  
 904 | On the Recursive Teaching Dimension of VC Classes
 905 | Bo Tang*, University of Oxford; Xi Chen, Columbia University; Yu Cheng, U of Southern California
 906 |  
 907 | InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
 908 | Xi Chen*, UC Berkeley; OpenAI; Yan Duan, UC Berkeley; Rein Houthooft, Ghent University - iMinds; UC Berkeley; OpenAI; John Schulman, OpenAI; Ilya Sutskever, ; Pieter Abbeel, 
 909 |  
 910 | Hardness of Online Sleeping Combinatorial Optimization Problems
 911 | Satyen Kale*, ; Chansoo Lee, ; David Pal, 
 912 |  
 913 | Mixed Linear Regression with Multiple Components
 914 | Kai Zhong*, UT AUSTIN; Prateek Jain, Microsoft Research; Inderjit Dhillon, 
 915 |  
 916 | Sequential Neural Models with Stochastic Layers
 917 | Marco Fraccaro*, DTU; Søren Sønderby, KU; Ulrich Paquet, ; Ole Winther, DTU
 918 |  
 919 | Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
 920 | Hongseok Namkoong*, Stanford University; John Duchi, 
 921 |  
 922 | Minimizing Quadratic Functions in Constant Time
 923 | Kohei Hayashi*, AIST; Yuichi Yoshida, NII
 924 |  
 925 | Improved Techniques for Training GANs
 926 | Tim Salimans*, ; Ian Goodfellow, OpenAI; Wojciech Zaremba, OpenAI; Vicki Cheung, OpenAI; Alec Radford, OpenAI; Xi Chen, UC Berkeley; OpenAI
 927 |  
 928 | DeepMath - Deep Sequence Models for Premise Selection
 929 | Geoffrey Irving*, ; Christian Szegedy, ; Alexander Alemi, Google; Francois Chollet, ; Josef Urban, Czech Technical University in Prague
 930 |  
 931 | Learning Multiagent Communication with Backpropagation
 932 | Sainbayar Sukhbaatar, NYU; Arthur Szlam, ; Rob Fergus*, New York University
 933 | Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
 934 | Amit Daniely*, ; Roy Frostig, Stanford University; Yoram Singer, Google
 935 |  
 936 | Learning the Number of Neurons in Deep Networks
 937 | Jose Alvarez*, NICTA; Mathieu Salzmann, EPFL
 938 |  
 939 | Finding significant combinations of features in the presence of categorical covariates
 940 | Laetitia Papaxanthos*, ETH Zurich; Felipe Llinares, ETH Zurich; Dean Bodenham, ETH Zurich; Karsten Borgwardt, 
 941 |  
 942 | Examples are not Enough, Learn to Criticize! Model Criticism for Interpretable Machine Learning
 943 | Been Kim*, ; Rajiv Khanna, UT Austin; Sanmi Koyejo, UIUC
 944 |  
 945 | Optimistic Bandit Convex Optimization
 946 | Scott Yang*, New York University; Mehryar Mohri, 
 947 |  
 948 | Safe Policy Improvement by Minimizing Robust Baseline Regret
 949 | Mohamad Ghavamzadeh*, ; Marek Petrik, ; Yinlam Chow, Stanford University
 950 |  
 951 | Graphons, mergeons, and so on!
 952 | Justin Eldridge*, The Ohio State University; Mikhail Belkin, ; Yusu Wang, The Ohio State University
 953 |  
 954 | Hierarchical Clustering via Spreading Metrics
 955 | Aurko Roy*, Georgia Tech; Sebastian Pokutta, GeorgiaTech
 956 |  
 957 | Learning Bayesian networks with ancestral constraints
 958 | Eunice Yuh-Jie Chen*, UCLA; Yujia Shen, ; Arthur Choi, ; Adnan Darwiche, 
 959 |  
 960 | Pruning Random Forests for Prediction on a Budget
 961 | Feng Nan*, Boston University; Joseph Wang, Boston University; Venkatesh Saligrama, 
 962 |  
 963 | Clustering with Bregman Divergences: an Asymptotic Analysis
 964 | Chaoyue Liu*, The Ohio State University; Mikhail Belkin, 
 965 |  
 966 | Variational Autoencoder for Deep Learning of Images, Labels and Captions
 967 | Yunchen Pu*, Duke University; Zhe Gan, Duke; Ricardo Henao, ; Xin Yuan, Bell Labs; chunyuan Li, Duke; Andrew Stevens, Duke University; Lawrence Carin, 
 968 |  
 969 | Encode, Review, and Decode: Reviewer Module for Caption Generation
 970 | Zhilin Yang*, Carnegie Mellon University; Ye Yuan, Carnegie Mellon University; Yuexin Wu, Carnegie Mellon University; William Cohen, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto
 971 |  
 972 | Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
 973 | Qiang Liu*, ; Dilin Wang, Dartmouth College
 974 |  
 975 | A Bio-inspired Redundant Sensing Architecture
 976 | Anh Tuan Nguyen*, University of Minnesota; Jian Xu, University of Minnesota; Zhi Yang, University of Minnesota
 977 |  
 978 | Contextual semibandits via supervised learning oracles
 979 | Akshay Krishnamurthy*, ; Alekh Agarwal, Microsoft; Miro Dudik, 
 980 |  
 981 | Blind Attacks on Machine Learners
 982 | Alex Beatson*, Princeton University; Zhaoran Wang, Princeton University; Han Liu, 
 983 |  
 984 | Universal Correspondence Network
 985 | Christopher Choy*, Stanford University; Manmohan Chandraker, NEC Labs America; JunYoung Gwak, Stanford University; Silvio Savarese, Stanford University
 986 |  
 987 | Satisfying Real-world Goals with Dataset Constraints
 988 | Gabriel Goh*, UC Davis; Andy Cotter, ; Maya Gupta, ; Michael Friedlander, UC Davis
 989 |  
 990 | Deep Learning for Predicting Human Strategic Behavior
 991 | Jason Hartford*, University of British Columbia; Kevin Leyton-Brown, ; James Wright, University of British Columbia
 992 |  
 993 | Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games
 994 | Sougata Chaudhuri*, University of Michigan ; Ambuj Tewari, University of Michigan
 995 |  
 996 | Eliciting and Aggregating Categorical Data
 997 | Yiling Chen, ; Rafael Frongillo, ; Chien-Ju Ho*, 
 998 |  
 999 | Measuring the reliability of MCMC inference with Bidirectional Monte Carlo
1000 | Roger Grosse, ; Siddharth Ancha, University of Toronto; Daniel Roy*, 
1001 |  
1002 | Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation
1003 | Weihao Gao, UIUC; Sewoong Oh*, ; Pramod Viswanath, UIUC
1004 |  
1005 | Selective inference for group-sparse linear models
1006 | Fan Yang, University of Chicago; Rina Foygel Barber*, ; Prateek Jain, Microsoft Research; John Lafferty, 
1007 |  
1008 | Graph Clustering: Block-models and model free results
1009 | Yali Wan*, University of Washington; Marina Meila, University of Washington
1010 |  
1011 | Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution
1012 | Christopher Lynn*, University of Pennsylvania; Dan Lee , University of Pennsylvania
1013 |  
1014 | Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Neuroscience
1015 | Hao Zhou, University of Wisconsin Madiso; Vamsi Ithapu*, University of Wisconsin Madison; Sathya Ravi, University of Wisconsin Madiso; Vikas Singh, UW Madison; Grace Wahba, University of Wisconsin Madison; Sterling Johnson, University of Wisconsin Madison
1016 |  
1017 | Geometric Dirichlet Means Algorithm for Topic Inference
1018 | Mikhail Yurochkin*, University of Michigan; Long Nguyen, 
1019 |  
1020 | Structured Prediction Theory Based on Factor Graph Complexity
1021 | Corinna Cortes, ; Vitaly Kuznetsov*, Courant Institute; Mehryar Mohri, ; Scott Yang, New York University
1022 |  
1023 | Improved Dropout for Shallow and Deep Learning
1024 | Zhe Li, The University of Iowa; Boqing Gong, University of Central Florida; Tianbao Yang*, University of Iowa
1025 |  
1026 | Constraints Based Convex Belief Propagation
1027 | Yaniv Tenzer*, The Hebrew University; Alexander Schwing, ; Kevin Gimpel, ; Tamir Hazan, 
1028 |  
1029 | Error Analysis of Generalized Nyström Kernel Regression
1030 | Hong Chen, University of Texas; Haifeng Xia, Huazhong Agricultural University; Heng Huang*, University of Texas Arlington
1031 |  
1032 | A Probabilistic Framework for Deep Learning
1033 | Ankit Patel, Baylor College of Medicine; Rice University; Tan Nguyen*, Rice University; Richard Baraniuk, 
1034 |  
1035 | General Tensor Spectral Co-clustering for Higher-Order Data
1036 | Tao Wu*, Purdue University; Austin Benson, Stanford University; David Gleich, 
1037 |  
1038 | Cyclades: Conflict-free Asynchronous Machine Learning
1039 | Xinghao Pan*, UC Berkeley; Stephen Tu, UC Berkeley; Maximilian Lam, UC Berkeley; Dimitris Papailiopoulos, ; Ce Zhang, Stanford; Michael Jordan, ; Kannan Ramchandran, ; Christopher Re, ; Ben Recht, 
1040 |  
1041 | Single Pass PCA of Matrix Products
1042 | Shanshan Wu*, UT Austin; Srinadh Bhojanapalli, TTI Chicago; Sujay Sanghavi, ; Alexandros G. Dimakis, 
1043 |  
1044 | Stochastic Variational Deep Kernel Learning
1045 | Andrew Wilson*, Carnegie Mellon University; Zhiting Hu, Carnegie Mellon University; Ruslan Salakhutdinov, University of Toronto; Eric Xing, Carnegie Mellon University
1046 |  
1047 | Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models
1048 | Marc Vuffray*, Los Alamos National Laboratory; Sidhant Misra, Los Alamos National Laboratory; Andrey Lokhov, Los Alamos National Laboratory; Misha Chertkov, Los Alamos National Laboratory
1049 |  
1050 | Long-term Causal Effects via Behavioral Game Theory
1051 | Panos Toulis*, University of Chicago; David Parkes, Harvard University
1052 |  
1053 | Measuring Neural Net Robustness with Constraints
1054 | Osbert Bastani*, Stanford University; Yani Ioannou, University of Cambridge; Leonidas Lampropoulos, University of Pennsylvania; Dimitrios Vytiniotis, Microsoft Research; Aditya Nori, Microsoft Research; Antonio Criminisi, 
1055 |  
1056 | Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations
1057 | Huishuai Zhang*, Syracuse University; Yingbin Liang, Syracuse University
1058 |  
1059 | Nearly Isometric Embedding by Relaxation
1060 | James McQueen*, University of Washington; Marina Meila, University of Washington; Dominique Joncas, Google
1061 |  
1062 | Probabilistic Inference with Generating Functions for Poisson Latent Variable Models
1063 | Kevin Winner*, UMass CICS; Daniel Sheldon, 
1064 |  
1065 | Causal meets Submodular: Subset Selection with Directed Information
1066 | Yuxun Zhou*, UC Berkeley; Costas Spanos, 
1067 |  
1068 | Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
1069 | Ayan Chakrabarti*, ; Jingyu Shao, UCLA; Greg Shakhnarovich, 
1070 |  
1071 | Deep Neural Networks with Inexact Matching for Person Re-Identification
1072 | Arulkumar Subramaniam, IIT Madras; Moitreya Chatterjee*, IIT Madras; Anurag Mittal, IIT Madras
1073 |  
1074 | Global Analysis of Expectation Maximization for Mixtures of Two Gaussians
1075 | Ji Xu, Columbia university; Daniel Hsu*, ; Arian Maleki, Columbia University
1076 |  
1077 | Estimating the class prior and posterior from noisy positives and unlabeled data
1078 | Shanatnu Jain*, Indiana University; Martha White, ; Predrag Radivojac, 
1079 |  
1080 | Kronecker Determinantal Point Processes
1081 | Zelda Mariet*, MIT; Suvrit Sra, MIT
1082 |  
1083 | Finite Sample Prediction and Recovery Bounds for Ordinal Embedding
1084 | Lalit Jain*, University of Wisconsin-Madison; Kevin Jamieson, UC Berkeley; Robert Nowak, University of Wisconsin Madison
1085 |  
1086 | Feature-distributed sparse regression: a screen-and-clean approach
1087 | Jiyan Yang*, Stanford University; Michael Mahoney, ; Michael Saunders, Stanford University; Yuekai Sun, University of Michigan
1088 |  
1089 | Learning Bound for Parameter Transfer Learning
1090 | Wataru Kumagai*, Kanagawa University
1091 |  
1092 | Learning under uncertainty: a comparison between R-W and Bayesian approach
1093 | He Huang*, LIBR; Martin Paulus, LIBR
1094 |  
1095 | Bi-Objective Online Matching and Submodular Allocations
1096 | Hossein Esfandiari*, University of Maryland; Nitish Korula, Google Research; Vahab Mirrokni, Google
1097 |  
1098 | Quantized Random Projections and Non-Linear Estimation of Cosine Similarity
1099 | Ping Li, ; Michael Mitzenmacher, Harvard University; Martin Slawski*, 
1100 |  
1101 | The non-convex Burer-Monteiro approach works on smooth semidefinite programs
1102 | Nicolas Boumal, ; Vlad Voroninski*, MIT; Afonso Bandeira, 
1103 |  
1104 | Dimensionality Reduction of Massive Sparse Datasets Using Coresets
1105 | Dan Feldman, ; Mikhail Volkov*, MIT; Daniela Rus, MIT
1106 |  
1107 | Using Social Dynamics to Make Individual Predictions: Variational Inference with Stochastic Kinetic Model
1108 | Zhen Xu*, SUNY at Buffalo; Wen Dong, ; Sargur Srihari, 
1109 |  
1110 | Supervised learning through the lens of compression
1111 | Ofir David*, Technion - Israel institute of technology; Shay Moran, Technion - Israel institue of Technology; Amir Yehudayoff, Technion - Israel institue of Technology
1112 |  
1113 | Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
1114 | Xinghua Lou*, Vicarious FPC Inc; Ken Kansky, ; Wolfgang Lehrach, ; CC Laan, ; Bhaskara Marthi, ; D. Scott Phoenix, ; Dileep George, 
1115 |  
1116 | Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections
1117 | Xiao-Jiao Mao, Nanjing University; Chunhua Shen*, ; Yu-Bin Yang, 
1118 |  
1119 | Object based Scene Representations using Fisher Scores of Local Subspace Projections
1120 | Mandar Dixit*, UC San Diego; Nuno Vasconcelos, 
1121 |  
1122 | Active Learning with Oracle Epiphany
1123 | Tzu-Kuo Huang, Microsoft Research; Lihong Li, Microsoft Research; Ara Vartanian, University of Wisconsin-Madison; Saleema Amershi, Microsoft; Xiaojin Zhu*, 
1124 |  
1125 | Statistical Inference for Pairwise Graphical Models Using Score Matching
1126 | Ming Yu*, The University of Chicago; Mladen Kolar, ; Varun Gupta, University of Chicago
1127 |  
1128 | Improved Error Bounds for Tree Representations of Metric Spaces
1129 | Samir Chowdhury*, The Ohio State University; Facundo Memoli, ; Zane Smith, 
1130 |  
1131 | Can Peripheral Representations Improve Clutter Metrics on Complex Scenes?
1132 | Arturo Deza*, UCSB; Miguel Eckstein, UCSB
1133 |  
1134 | On Multiplicative Integration with Recurrent Neural Networks
1135 | Yuhuai Wu*, University of Toronto; Saizheng Zhang, University of Montreal; ying Zhang, University of Montreal; Yoshua Bengio, U. Montreal; Ruslan Salakhutdinov, University of Toronto
1136 |  
1137 | Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices
1138 | Kirthevasan Kandasamy*, CMU; Maruan Al-Shedivat, CMU; Eric Xing, Carnegie Mellon University
1139 |  
1140 | Regret Bounds for Non-decomposable Metrics with Missing Labels
1141 | Nagarajan Natarajan*, Microsoft Research Bangalore; Prateek Jain, Microsoft Research
1142 |  
1143 | Robust k-means: a Theoretical Revisit
1144 | ALEXANDROS GEORGOGIANNIS*, TECHNICAL UNIVERSITY OF CRETE
1145 |  
1146 | Bayesian optimization for automated model selection
1147 | Gustavo Malkomes, Washington University; Charles Schaff, Washington University in St. Louis; Roman Garnett*, 
1148 |  
1149 | A Probabilistic Model of Social Decision Making based on Reward Maximization
1150 | Koosha Khalvati*, University of Washington; Seongmin Park, Cognitive Neuroscience Center; Jean-Claude Dreher, Centre de Neurosciences Cognitives; Rajesh Rao, University of Washington
1151 |  
1152 | Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition
1153 | Ahmed Alaa*, UCLA; Mihaela Van Der Schaar, 
1154 |  
1155 | Fast and Flexible Monotonic Functions with Ensembles of Lattices
1156 | Mahdi Fard, ; Kevin Canini, ; Andy Cotter, ; Jan Pfeifer, Google; Maya Gupta*, 
1157 |  
1158 | Conditional Generative Moment-Matching Networks
1159 | Yong Ren, Tsinghua University; Jun Zhu*, ; Jialian Li, Tsinghua University; Yucen Luo, 
1160 |  
1161 | Stochastic Gradient MCMC with Stale Gradients
1162 | Changyou Chen*, ; Nan Ding, Google; chunyuan Li, Duke; Yizhe Zhang, Duke university; Lawrence Carin, 
1163 |  
1164 | Composing graphical models with neural networks for structured representations and fast inference
1165 | Matthew Johnson, ; David Duvenaud*, ; Alex Wiltschko, Harvard University and Twitter; Ryan Adams, ; Sandeep Datta, Harvard Medical School
1166 |  
1167 | Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling
1168 | Nina Balcan, ; Hongyang Zhang*, CMU
1169 |  
1170 | Combinatorial semi-bandit with known covariance
1171 | Rémy Degenne*, Université Paris Diderot; Vianney Perchet, 
1172 |  
1173 | Matrix Completion has No Spurious Local Minimum
1174 | Rong Ge, ; Jason Lee, UC Berkeley; Tengyu Ma*, Princeton University
1175 |  
1176 | The Multiscale Laplacian Graph Kernel
1177 | Risi Kondor*, ; Horace Pan, UChicago
1178 |  
1179 | Adaptive Averaging in Accelerated Descent Dynamics
1180 | Walid Krichene*, UC Berkeley; Alexandre Bayen, UC Berkeley; Peter Bartlett, 
1181 |  
1182 | Sub-sampled Newton Methods with Non-uniform Sampling
1183 | Peng Xu*, Stanford University; Jiyan Yang, Stanford University; Farbod Roosta-Khorasani, University of California Berkeley; Christopher Re, ; Michael Mahoney, 
1184 |  
1185 | Stochastic Gradient Geodesic MCMC Methods
1186 | Chang Liu*, Tsinghua University; Jun Zhu, ; Yang Song, Stanford University
1187 |  
1188 | Variational Bayes on Monte Carlo Steroids
1189 | Aditya Grover*, Stanford University; Stefano Ermon, 
1190 |  
1191 | Showing versus doing: Teaching by demonstration
1192 | Mark Ho*, Brown University; Michael L. Littman, ; James MacGlashan, Brown University; Fiery Cushman, Harvard University; Joe Austerweil, 
1193 |  
1194 | Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation
1195 | Jianxu Chen*, University of Notre Dame; Lin Yang, University of Notre Dame; Yizhe Zhang, University of Notre Dame; Mark Alber, University of Notre Dame; Danny Chen, University of Notre Dame
1196 |  
1197 | Maximization of Approximately Submodular Functions
1198 | Thibaut Horel*, Harvard University; Yaron Singer, 
1199 |  
1200 | A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order
1201 | Xiangru Lian, University of Rochester; Huan Zhang, ; Cho-Jui Hsieh, ; Yijun Huang, ; Ji Liu*, 
1202 |  
1203 | Learning Infinite RBMs with Frank-Wolfe
1204 | Wei Ping*, UC Irvine; Qiang Liu, ; Alexander Ihler, 
1205 |  
1206 | Estimating the Size of a Large Network and its Communities from a Random Sample
1207 | Lin Chen*, Yale University; Amin Karbasi, ; Forrest Crawford, Yale University
1208 |  
1209 | Learning Sensor Multiplexing Design through Back-propagation
1210 | Ayan Chakrabarti*, 
1211 |  
1212 | On Robustness of Kernel Clustering
1213 | Bowei Yan*, University of Texas at Austin; Purnamrita Sarkar, U.C. Berkeley
1214 |  
1215 | High resolution neural connectivity from incomplete tracing data using nonnegative spline regression
1216 | Kameron Harris*, University of Washington; Stefan Mihalas, Allen Institute for Brain Science; Eric Shea-Brown, University of Washington
1217 |  
1218 | MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
1219 | Gregory Rogez*, Inria; Cordelia Schmid, 
1220 |  
1221 | A New Liftable Class for First-Order Probabilistic Inference
1222 | Seyed Mehran Kazemi*, UBC; Angelika Kimmig, KU Leuven; Guy Van den Broeck, ; David Poole, UBC
1223 |  
1224 | The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
1225 | Jian Wu*, Cornell University; Peter I. Frazier, 
1226 |  
1227 | Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
1228 | Vasilis Syrgkanis*, ; Haipeng Luo, Princeton University; Akshay Krishnamurthy, ; Robert Schapire, 
1229 |  
1230 | Consistent Estimation of Functions of Data Missing Non-Monotonically and Not at Random
1231 | Ilya Shpitser*, 
1232 |  
1233 | Optimistic Gittins Indices
1234 | Eli Gutin*, Massachusetts Institute of Tec; Vivek Farias, 
1235 |  
1236 | Finite-Dimensional BFRY Priors and Variational Bayesian Inference for Power Law Models
1237 | Juho Lee*, POSTECH; Lancelot James, HKUST; Seungjin Choi, POSTECH
1238 |  
1239 | Launch and Iterate: Reducing Prediction Churn
1240 | Mahdi Fard, ; Quentin Cormier, Google; Kevin Canini, ; Maya Gupta*, 
1241 |  
1242 | “Congruent” and “Opposite” Neurons: Sisters for Multisensory Integration and Segregation
1243 | Wen-Hao Zhang*, Institute of Neuroscience, Chinese Academy of Sciences; He Wang, HKUST; K. Y. Michael Wong, HKUST; Si Wu, 
1244 |  
1245 | Learning shape correspondence with anisotropic convolutional neural networks
1246 | Davide Boscaini*, University of Lugano; Jonathan Masci, ; Emanuele Rodolà, University of Lugano; Michael Bronstein, University of Lugano
1247 |  
1248 | Pairwise Choice Markov Chains
1249 | Stephen Ragain*, Stanford University; Johan Ugander, 
1250 |  
1251 | NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
1252 | Davood Hajinezhad*, Iowa State University; Mingyi Hong, ; Tuo Zhao, Johns Hopkins University; Zhaoran Wang, Princeton University
1253 |  
1254 | Clustering with Same-Cluster Queries
1255 | Hassan Ashtiani, University of Waterloo; Shrinu Kushagra*, University of Waterloo; Shai Ben-David, U. Waterloo
1256 |  
1257 | Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
1258 | S. M. Ali Eslami*, Google DeepMind; Nicolas Heess, ; Theophane Weber, ; Yuval Tassa, Google DeepMind; David Szepesvari, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Geoffrey Hinton, Google
1259 |  
1260 | Parameter Learning for Log-supermodular Distributions
1261 | Tatiana Shpakova*, Inria - ENS Paris; Francis Bach, 
1262 |  
1263 | Deconvolving Feedback Loops in Recommender Systems
1264 | Ayan Sinha*, Purdue; David Gleich, ; Karthik Ramani, Purdue University
1265 |  
1266 | Structured Matrix Recovery via the Generalized Dantzig Selector
1267 | Sheng Chen*, University of Minnesota; Arindam Banerjee, 
1268 |  
1269 | Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making
1270 | Himabindu Lakkaraju*, Stanford University; Jure Leskovec, 
1271 |  
1272 | Automatic Neuron Detection in Calcium Imaging Data Using Convolutional Networks
1273 | Noah Apthorpe*, Princeton University; Alexander Riordan, Princeton University; Robert Aguilar, Princeton University; Jan Homann, Princeton University; Yi Gu, Princeton University; David Tank, Princeton University; H. Sebastian Seung, Princeton University
1274 |  
1275 | Designing smoothing functions for improved worst-case competitive ratio in online optimization
1276 | Reza Eghbali*, University of washington; Maryam Fazel, University of Washington
1277 |  
1278 | Convergence guarantees for kernel-based quadrature rules in misspecified settings
1279 | Motonobu Kanagawa*, ; Bharath Sriperumbudur, ; Kenji Fukumizu, 
1280 |  
1281 | Unsupervised Learning from Noisy Networks with Applications to Hi-C Data
1282 | Bo Wang*, Stanford University; Junjie Zhu, Stanford University; Armin Pourshafeie, Stanford University
1283 |  
1284 | A non-generative theory for unsupervised learning and efficient improper dictionary learning
1285 | Elad Hazan, ; Tengyu Ma*, Princeton University
1286 |  
1287 | Equality of Opportunity in Supervised Learning
1288 | Moritz Hardt*, ; Eric Price, ; Nathan Srebro, 
1289 |  
1290 | Scaled Least Squares Estimator for GLMs in Large-Scale Problems
1291 | Murat Erdogdu*, Stanford University; Lee Dicker, ; Mohsen Bayati, 
1292 |  
1293 | Interpretable Nonlinear Dynamic Modeling of Neural Trajectories
1294 | Yuan Zhao*, Stony Brook University; Il Memming Park, 
1295 |  
1296 | Search Improves Label for Active Learning
1297 | Alina Beygelzimer, Yahoo Inc; Daniel Hsu, ; John Langford, ; Chicheng Zhang*, UCSD
1298 |  
1299 | Higher-Order Factorization Machines
1300 | Mathieu Blondel*, NTT; Akinori Fujino, NTT; Naonori Ueda, ; Masakazu Ishihata, Hokkaido University
1301 |  
1302 | Exponential expressivity in deep neural networks through transient chaos
1303 | Ben Poole*, Stanford University; Subhaneil Lahiri, Stanford University; Maithra Raghu, Cornell University; Jascha Sohl-Dickstein, ; Surya Ganguli, Stanford
1304 |  
1305 | Split LBI: An Iterative Regularization Path with Structural Sparsity
1306 | Chendi Huang, Peking University; Xinwei Sun, ; Jiechao Xiong, Peking University; Yuan Yao*, 
1307 |  
1308 | An equivalence between high dimensional Bayes optimal inference and M-estimation
1309 | Madhu Advani*, Stanford University; Surya Ganguli, Stanford
1310 |  
1311 | Synthesizing the preferred inputs for neurons in neural networks via deep generator networks
1312 | Anh Nguyen*, University of Wyoming; Alexey Dosovitskiy, ; Jason Yosinski, Cornell; Thomas Brox, University of Freiburg; Jeff Clune, 
1313 |  
1314 | Deep Submodular Functions
1315 | Brian Dolhansky*, University of Washington; Jeff Bilmes, University of Washington, Seattle
1316 |  
1317 | Discriminative Gaifman Models
1318 | Mathias Niepert*, 
1319 |  
1320 | Leveraging Sparsity for Efficient Submodular Data Summarization
1321 | Erik Lindgren*, University of Texas at Austin; Shanshan Wu, UT Austin; Alexandros G. Dimakis, 
1322 |  
1323 | Local Minimax Complexity of Stochastic Convex Optimization
1324 | Sabyasachi Chatterjee, University of Chicago; John Duchi, ; John Lafferty, ; Yuancheng Zhu*, University of Chicago
1325 |  
1326 | Stochastic Optimization for Large-scale Optimal Transport
1327 | Aude Genevay*, Université Paris Dauphine; Marco Cuturi, ; Gabriel Peyré, ; Francis Bach, 
1328 |  
1329 | On Mixtures of Markov Chains
1330 | Rishi Gupta*, Stanford; Ravi Kumar, ; Sergei Vassilvitskii, Google
1331 |  
1332 | Linear Contextual Bandits with Knapsacks
1333 | Shipra Agrawal*, ; Nikhil Devanur, Microsoft Research
1334 |  
1335 | Reconstructing Parameters of Spreading Models from Partial Observations
1336 | Andrey Lokhov*, Los Alamos National Laboratory
1337 |  
1338 | Spatiotemporal Residual Networksfor Video Action Recognition
1339 | Christoph Feichtenhofer*, Graz University of Technology; Axel Pinz, Graz University of Technology; Richard Wildes, York University Toronto
1340 |  
1341 | Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
1342 | Behnam Neyshabur*, TTI-Chicago; Yuhuai Wu, University of Toronto; Ruslan Salakhutdinov, University of Toronto; Nathan Srebro, 
1343 |  
1344 | Strategic Attentive Writer for Learning Macro-Actions
1345 | Alexander Vezhnevets*, Google DeepMind; Volodymyr Mnih, ; Simon Osindero, Google DeepMind; Alex Graves, ; Oriol Vinyals, ; John Agapiou, ; Koray Kavukcuoglu, Google DeepMind
1346 |  
1347 | The Limits of Learning with Missing Data
1348 | Brian Bullins*, Princeton University; Elad Hazan, ; Tomer Koren, Technion---Israel Inst. of Technology
1349 |  
1350 | RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism
1351 | Edward Choi*, Georgia Institute of Technolog; Mohammad Taha Bahadori, Gatech; Jimeng Sun, 
1352 |  
1353 | Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers
1354 | Yu-Xiang Wang*, Carnegie Mellon University; Veeranjaneyulu Sadhanala, Carnegie Mellon University; Ryan Tibshirani, 
1355 |  
1356 | Community Detection on Evolving Graphs
1357 | Stefano Leonardi*, Sapienza University of Rome; Aris Anagnostopoulos, Sapienza University of Rome; Jakub Łącki, Sapienza University of Rome; Silvio Lattanzi, Google; Mohammad Mahdian, Google Research, New York
1358 |  
1359 | Online and Differentially-Private Tensor Decomposition
1360 | Yining Wang*, Carnegie Mellon University; Anima Anandkumar, UC Irvine
1361 |  
1362 | Dimension-Free Iteration Complexity of Finite Sum Optimization Problems
1363 | Yossi Arjevani*, Weizmann Institute of Science; Ohad Shamir, Weizmann Institute of Science
1364 |  
1365 | Towards Conceptual Compression
1366 | Karol Gregor*, ; Frederic Besse, Google DeepMind; Danilo Jimenez Rezende, ; Ivo Danihelka, ; Daan Wierstra, Google DeepMind
1367 |  
1368 | Exact Recovery of Hard Thresholding Pursuit
1369 | Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang, 
1370 |  
1371 | Data Programming: Creating Large Training Sets, Quickly
1372 | Alexander Ratner*, Stanford University; Christopher De Sa, Stanford University; Sen Wu, Stanford University; Daniel Selsam, Stanford; Christopher Ré, Stanford University
1373 |  
1374 | Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back
1375 | Vitaly Feldman*, 
1376 |  
1377 | Dynamic matrix recovery from incomplete observations under an exact low-rank constraint
1378 | Liangbei Xu*, Gatech; Mark Davenport, 
1379 |  
1380 | Fast Distributed Submodular Cover: Public-Private Data Summarization
1381 | Baharan Mirzasoleiman*, ETH Zurich; Morteza Zadimoghaddam, ; Amin Karbasi, 
1382 |  
1383 | Estimating Nonlinear Neural Response Functions using GP Priors and Kronecker Methods
1384 | Cristina Savin*, IST Austria; Gašper Tkačik, Institute of Science and Technology Austria
1385 |  
1386 | Lifelong Learning with Weighted Majority Votes
1387 | Anastasia Pentina*, IST Austria; Ruth Urner, MPI Tuebingen
1388 |  
1389 | Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes
1390 | Jack Rae*, Google DeepMind; Jonathan Hunt, ; Ivo Danihelka, ; Tim Harley, Google DeepMind; Andrew Senior, ; Greg Wayne, ; Alex Graves, ; Timothy Lillicrap, Google DeepMind
1391 |  
1392 | Matching Networks for One Shot Learning
1393 | Oriol Vinyals*, ; Charles Blundell, DeepMind; Timothy Lillicrap, Google DeepMind; Koray Kavukcuoglu, Google DeepMind; Daan Wierstra, Google DeepMind
1394 |  
1395 | Tight Complexity Bounds for Optimizing Composite Objectives
1396 | Blake Woodworth*, Toyota Technological Institute; Nathan Srebro, 
1397 |  
1398 | Graphical Time Warping for Joint Alignment of Multiple Curves
1399 | Yizhi Wang, Virginia Tech; David Miller, The Pennsylvania State University; Kira Poskanzer, University of California, San Francisco; Yue Wang, Virginia Tech; Lin Tian, The University of California, Davis; Guoqiang Yu*, 
1400 |  
1401 | Unsupervised Risk Estimation Using Only Conditional Independence Structure
1402 | Jacob Steinhardt*, Stanford University; Percy Liang, 
1403 |  
1404 | MetaGrad: Multiple Learning Rates in Online Learning
1405 | Tim Van Erven*, ; Wouter M. Koolen, 
1406 |  
1407 | Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
1408 | Tejas Kulkarni, MIT; Karthik Narasimhan*, MIT; Ardavan Saeedi, MIT; Joshua Tenenbaum, 
1409 |  
1410 | High Dimensional Structured Superposition Models
1411 | Qilong Gu*, University of Minnesota; Arindam Banerjee, 
1412 |  
1413 | Joint quantile regression in vector-valued RKHSs
1414 | Maxime Sangnier*, LTCI, CNRS, Télécom ParisTech; Olivier Fercoq, ; Florence d’Alché-Buc, 
1415 |  
1416 | The Forget-me-not Process
1417 | Kieran Milan, Google DeepMind; Joel Veness*, ; James Kirkpatrick, Google DeepMind; Michael Bowling, ; Anna Koop, University of Alberta; Demis Hassabis, 
1418 |  
1419 | Wasserstein Training of Restricted Boltzmann Machines
1420 | Gregoire Montavon*, ; Klaus-Robert Muller, ; Marco Cuturi, 
1421 |  
1422 | Communication-Optimal Distributed Clustering
1423 | Jiecao Chen, Indiana University Bloomington; He Sun*, The University of Bristol; David Woodruff, ; Qin Zhang, 
1424 |  
1425 | Probing the Compositionality of Intuitive Functions
1426 | Eric Schulz*, University College London; Joshua Tenenbaum, ; David Duvenaud, ; Maarten Speekenbrink, University College London; Sam Gershman, 
1427 |  
1428 | Ladder Variational Autoencoders
1429 | Casper Kaae Sønderby*, University of Copenhagen; Tapani Raiko, ; Lars Maaløe, Technical University of Denmark; Søren Sønderby, KU; Ole Winther, Technical University of Denmark
1430 |  
1431 | The Multiple Quantile Graphical Model
1432 | Alnur Ali*, Carnegie Mellon University; Zico Kolter, ; Ryan Tibshirani, 
1433 |  
1434 | Threshold Learning for Optimal Decision Making
1435 | Nathan Lepora*, University of Bristol
1436 |  
1437 | Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA
1438 | Aapo Hyvärinen*, ; Hiroshi Morioka, University of Helsinki
1439 |  
1440 | Can Active Memory Replace Attention?
1441 | Łukasz Kaiser*, ; Samy Bengio, 
1442 |  
1443 | Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning
1444 | Taiji Suzuki*, ; Heishiro Kanagawa, ; Hayato Kobayashi, ; Nobuyuki Shimizu, ; Yukihiro Tagami, 
1445 |  
1446 | The Product Cut
1447 | Thomas Laurent*, Loyola Marymount University; James Von Brecht, CSULB; Xavier Bresson, ; Arthur Szlam, 
1448 |  
1449 | Learning Sparse Gaussian Graphical Models with Overlapping Blocks
1450 | Mohammad Javad Hosseini*, University of Washington; Su-In Lee, 
1451 |  
1452 | Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale
1453 | Firas Abuzaid*, MIT; Joseph Bradley, Databricks; Feynman Liang, Cambridge University Engineering Department; Andrew Feng, Yahoo!; Lee Yang, Yahoo!; Matei Zaharia, MIT; Ameet Talwalkar, 
1454 |  
1455 | Average-case hardness of RIP certification
1456 | Tengyao Wang, University of Cambridge; Quentin Berthet*, ; Yaniv Plan, University of British Columbia
1457 |  
1458 | Forward models at Purkinje synapses facilitate cerebellar anticipatory control
1459 | Ivan Herreros-Alonso*, Universitat Pompeu Fabra; Xerxes Arsiwalla, ; Paul Verschure, 
1460 |  
1461 | Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
1462 | Michaël Defferrard*, EPFL; Xavier Bresson, ; pierre Vandergheynst, EPFL
1463 |  
1464 | Deep Unsupervised Exemplar Learning
1465 | MIGUEL BAUTISTA*, HEIDELBERG UNIVERSITY; Artsiom Sanakoyeu, Heidelberg University; Ekaterina Tikhoncheva, Heidelberg University; Björn Ommer, 
1466 |  
1467 | Large-Scale Price Optimization via Network Flow
1468 | Shinji Ito*, NEC Coorporation; Ryohei Fujimaki, 
1469 |  
1470 | Online Pricing with Strategic and Patient Buyers
1471 | Michal Feldman, TAU; Tomer Koren, Technion---Israel Inst. of Technology; Roi Livni*, Huji; Yishay Mansour, Microsoft; Aviv Zohar, huji
1472 |  
1473 | Global Optimality of Local Search for Low Rank Matrix Recovery
1474 | Srinadh Bhojanapalli*, TTI Chicago; Behnam Neyshabur, TTI-Chicago; Nathan Srebro, 
1475 |  
1476 | Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences
1477 | Daniel Neil*, Institute of Neuroinformatics; Michael Pfeiffer, Institute of Neuroinformatics; Shih-Chii Liu, 
1478 |  
1479 | Improving PAC Exploration Using the Median of Means
1480 | Jason Pazis*, MIT; Ronald Parr, ; Jonathan How, MIT
1481 |  
1482 | Infinite Hidden Semi-Markov Modulated Interaction Point Process
1483 | Matt Zhang*, Nicta; Peng Lin, Data61; Ting Guo, Data61; Yang Wang, Data61, CSIRO; Fang Chen, Data61, CSIRO
1484 |  
1485 | Cooperative Inverse Reinforcement Learning
1486 | Dylan Hadfield-Menell*, UC Berkeley; Stuart Russell, UC Berkeley; Pieter Abbeel, ; Anca Dragan, 
1487 |  
1488 | Spatio-Temporal Hilbert Maps for Continuous Occupancy Representation in Dynamic Environments
1489 | Ransalu Senanayake*, The University of Sydney; Lionel Ott, The University of Sydney; Simon O'Callaghan, NICTA; Fabio Ramos, The University of Sydney
1490 |  
1491 | Select-and-Sample for Spike-and-Slab Sparse Coding
1492 | Abdul-Saboor Sheikh, University of Oldenburg; Jörg Lücke*, 
1493 |  
1494 | Tractable Operations for Arithmetic Circuits of Probabilistic Models
1495 | Yujia Shen*, ; Arthur Choi, ; Adnan Darwiche, 
1496 |  
1497 | Greedy Feature Construction
1498 | Dino Oglic*, University of Bonn; Thomas Gaertner, The University of Nottingham
1499 |  
1500 | Mistake Bounds for Binary Matrix Completion
1501 | Mark Herbster, ; Stephen Pasteris, UCL; Massimiliano Pontil*, 
1502 |  
1503 | Data driven estimation of Laplace-Beltrami operator
1504 | Frederic Chazal, INRIA; Ilaria Giulini, ; Bertrand Michel*, 
1505 |  
1506 | Tracking the Best Expert in Non-stationary Stochastic Environments
1507 | Chen-Yu Wei*, Academia Sinica; Yi-Te Hong, Academia Sinica; Chi-Jen Lu, Academia Sinica
1508 |  
1509 | Learning to learn by gradient descent by gradient descent
1510 | Marcin Andrychowicz*, Google Deepmind; Misha Denil, ; Sergio Gomez, Google DeepMind; Matthew Hoffman, Google DeepMind; David Pfau, Google DeepMind; Tom Schaul, ; Nando Freitas, Google
1511 |  
1512 | Kernel Observers: Systems-Theoretic Modeling and Inference of Spatiotemporally Evolving Processes
1513 | Harshal Maske, UIUC; Girish Chowdhary*, UIUC; Hassan Kingravi, Pindrop Security Services
1514 |  
1515 | Quantum Perceptron Models
1516 | Ashish Kapoor*, ; Nathan Wiebe, Microsoft Research; Krysta M. Svore, 
1517 |  
1518 | Guided Policy Search as Approximate Mirror Descent
1519 | William Montgomery*, University of Washington; Sergey Levine, University of Washington
1520 |  
1521 | The Power of Optimization from Samples
1522 | Eric Balkanski*, Harvard University; Aviad Rubinstein, UC Berkeley; Yaron Singer, 
1523 |  
1524 | Deep Exploration via Bootstrapped DQN
1525 | Ian Osband*, DeepMind; Charles Blundell, DeepMind; Alexander Pritzel, ; Benjamin Van Roy, 
1526 |  
1527 | A Multi-step Inertial Forward-Backward Splitting Method for Non-convex Optimization
1528 | Jingwei Liang*, GREYC, ENSICAEN; Jalal Fadili, ; Gabriel Peyré, 
1529 |  
1530 | Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages
1531 | Yin Cheng Ng*, University College London; Pawel Chilinski, University College London; Ricardo Silva, University College London
1532 |  
1533 | Convolutional Neural Fabrics
1534 | Shreyas Saxena*, INRIA; Jakob Verbeek, 
1535 |  
1536 | A Neural Transducer
1537 | Navdeep Jaitly*, ; Quoc Le, ; Oriol Vinyals, ; Ilya Sutskever, ; David Sussillo, Google; Samy Bengio, 
1538 |  
1539 | Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy
1540 | Aryan Mokhtari*, University of Pennsylvania; Hadi Daneshmand, ETH Zurich; Aurelien Lucchi, ; Thomas Hofmann, ; Alejandro Ribeiro, University of Pennsylvania
1541 |  
1542 | A Sparse Interactive Model for Inductive Matrix Completion
1543 | Jin Lu, University of Connecticut; Guannan Liang, University of Connecticut; jiangwen Sun, University of Connecticut; Jinbo Bi*, University of Connecticut
1544 |  
1545 | Coresets for Scalable Bayesian Logistic Regression
1546 | Jonathan Huggins*, MIT; Trevor Campbell, MIT; Tamara Broderick, MIT
1547 |  
1548 | Agnostic Estimation for Misspecified Phase Retrieval Models
1549 | Matey Neykov*, Princeton University; Zhaoran Wang, Princeton University; Han Liu, 
1550 |  
1551 | Linear Relaxations for Finding Diverse Elements in Metric Spaces
1552 | Aditya Bhaskara*, University of Utah; Mehrdad Ghadiri, Sharif University of Technolog; Vahab Mirrokni, Google; Ola Svensson, EPFL
1553 |  
1554 | Binarized Neural Networks
1555 | Itay Hubara*, Technion; Matthieu Courbariaux, Université de Montréal; Daniel Soudry, Columbia University; Ran El-Yaniv, Technion; Yoshua Bengio, Université de Montréal
1556 |  
1557 | On Local Maxima in the Population Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences
1558 | Chi Jin*, UC Berkeley; Yuchen Zhang, ; Sivaraman Balakrishnan, CMU; Martin Wainwright, UC Berkeley; Michael Jordan, 
1559 |  
1560 | Memory-Efficient Backpropagation Through Time
1561 | Audrunas Gruslys*, Google DeepMind; Remi Munos, Google DeepMind; Ivo Danihelka, ; Marc Lanctot, Google DeepMind; Alex Graves, 
1562 |  
1563 | Bayesian Optimization with Robust Bayesian Neural Networks
1564 | Jost Tobias Springenberg*, University of Freiburg; Aaron Klein, University of Freiburg; Stefan Falkner, University of Freiburg; Frank Hutter, University of Freiburg
1565 |  
1566 | Learnable Visual Markers
1567 | Oleg Grinchuk, Skolkovo Institute of Science and Technology; Vadim Lebedev, Skolkovo Institute of Science and Technology; Victor Lempitsky*, 
1568 |  
1569 | Fast Algorithms for Robust PCA via Gradient Descent
1570 | Xinyang Yi*, UT Austin; Dohyung Park, University of Texas at Austin; Yudong Chen, ; Constantine Caramanis, 
1571 |  
1572 | One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities
1573 | Michalis K. Titsias*, 
1574 |  
1575 | Learning Deep Embeddings with Histogram Loss
1576 | Evgeniya Ustinova, Skoltech; Victor Lempitsky*, 
1577 |  
1578 | Spectral Learning of Dynamic Systems from Nonequilibrium Data
1579 | Hao Wu*, Free University of Berlin; Frank Noe, 
1580 |  
1581 | Markov Chain Sampling in Discrete Probabilistic Models with Constraints
1582 | Chengtao Li*, MIT; Suvrit Sra, MIT; Stefanie Jegelka, MIT
1583 |  
1584 | Mapping Estimation for Discrete Optimal Transport
1585 | Michael Perrot*, University of Saint-Etienne, laboratoire Hubert Curien; Nicolas Courty, ; Rémi Flamary, ; Amaury Habrard, University of Saint-Etienne, Laboratoire Hubert Curien
1586 |  
1587 | BBO-DPPs: Batched Bayesian Optimization via Determinantal Point Processes
1588 | Tarun Kathuria*, Microsoft Research; Amit Deshpande, ; Pushmeet Kohli, 
1589 |  
1590 | Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images
1591 | Vladimir Golkov*, Technical University of Munich; Marcin Skwark, Vanderbilt University; Antonij Golkov, University of Augsburg; Alexey Dosovitskiy, ; Thomas Brox, University of Freiburg; Jens Meiler, Vanderbilt University; Daniel Cremers, Technical University of Munich
1592 |  
1593 | Linear Feature Encoding for Reinforcement Learning
1594 | Zhao Song*, Duke University; Ronald Parr, ; Xuejun Liao, Duke University; Lawrence Carin, 
1595 |  
1596 | A Minimax Approach to Supervised Learning
1597 | Farzan Farnia*, Stanford University; David Tse, Stanford University
1598 |  
1599 | Edge-Exchangeable Graphs and Sparsity
1600 | Diana Cai*, University of Chicago; Trevor Campbell, MIT; Tamara Broderick, MIT
1601 |  
1602 | A Locally Adaptive Normal Distribution
1603 | Georgios Arvanitidis*, DTU; Lars Kai Hansen, ; Søren Hauberg, 
1604 |  
1605 | Completely random measures for modelling block-structured sparse networks
1606 | Tue Herlau*, ; Mikkel Schmidt, DTU; Morten Mørup, Technical University of Denmark
1607 |  
1608 | Sparse Support Recovery with Non-smooth Loss Functions
1609 | Kévin Degraux*, Université catholique de Louva; Gabriel Peyré, ; Jalal Fadili, ; Laurent Jacques, Université catholique de Louvain
1610 |  
1611 | Neurons Equipped with Intrinsic Plasticity Learn Stimulus Intensity Statistics
1612 | Travis Monk*, University of Oldenburg; Cristina Savin, IST Austria; Jörg Lücke, 
1613 |  
1614 | Learning values across many orders of magnitude
1615 | Hado Van Hasselt*, ; Arthur Guez, ; Matteo Hessel, Google DeepMind; Volodymyr Mnih, ; David Silver, 
1616 |  
1617 | Adaptive Smoothed Online Multi-Task Learning
1618 | Keerthiram Murugesan*, Carnegie Mellon University; Hanxiao Liu, Carnegie Mellon University; Jaime Carbonell, CMU; Yiming Yang, CMU
1619 |  
1620 | Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
1621 | Matteo Turchetta, ETH Zurich; Felix Berkenkamp*, ETH Zurich; Andreas Krause, 
1622 |  
1623 | Probabilistic Linear Multistep Methods
1624 | Onur Teymur*, Imperial College London; Kostas Zygalakis, ; Ben Calderhead, 
1625 |  
1626 | Stochastic Three-Composite Convex Minimization
1627 | Alp Yurtsever*, EPFL; Bang Vu, ; Volkan Cevher, 
1628 |  
1629 | Using Fast Weights to Attend to the Recent Past
1630 | Jimmy Ba*, University of Toronto; Geoffrey Hinton, Google; Volodymyr Mnih, ; Joel Leibo, Google DeepMind; Catalin Ionescu, Google
1631 |  
1632 | Maximal Sparsity with Deep Networks?
1633 | Bo Xin*, Peking University; Yizhou Wang, Peking University; Wen Gao, peking university; David Wipf, 
1634 |  
1635 | Quantifying and Reducing Stereotypes in Word Embeddings
1636 | Tolga Bolukbasi*, Boston University; Kai-Wei Chang, ; James Zou, ; Venkatesh Saligrama, ; Adam Kalai, Microsoft Research
1637 |  
1638 | beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
1639 | Valentina Zantedeschi*, UJM Saint-Etienne, France; Rémi Emonet, ; Marc Sebban, 
1640 |  
1641 | Learning Additive Exponential Family Graphical Models via ℓ2,1-norm Regularized M-Estimation
1642 | Xiaotong Yuan*, Nanjing University of Informat; Ping Li, ; Tong Zhang, ; Qingshan Liu, ; Guangcan Liu, NUIST
1643 |  
1644 | Backprop KF: Learning Discriminative Deterministic State Estimators
1645 | Tuomas Haarnoja*, UC Berkeley; Anurag Ajay, UC Berkeley; Sergey Levine, University of Washington; Pieter Abbeel, 
1646 |  
1647 | 2-Component Recurrent Neural Networks
1648 | Xiang Li*, NJUST; Tao Qin, Microsoft; Jian Yang, ; Xiaolin Hu, ; Tie-Yan Liu, Microsoft Research
1649 |  
1650 | Fast recovery from a union of subspaces
1651 | Chinmay Hegde, ; Piotr Indyk, MIT; Ludwig Schmidt*, MIT
1652 |  
1653 | Incremental Learning for Variational Sparse Gaussian Process Regression
1654 | Ching-An Cheng*, Georgia Institute of Technolog; Byron Boots, 
1655 |  
1656 | A Consistent Regularization Approach for Structured Prediction
1657 | Carlo Ciliberto*, MIT; Lorenzo Rosasco, ; Alessandro Rudi, 
1658 |  
1659 | Clustering Signed Networks with the Geometric Mean of Laplacians
1660 | Pedro Eduardo Mercado Lopez*, Saarland University; Francesco Tudisco, Saarland University; Matthias Hein, Saarland University
1661 |  
1662 | An urn model for majority voting in classification ensembles
1663 | Víctor Soto, Columbia University; Alberto Suarez, ; Gonzalo Martínez-Muñoz*, 
1664 |  
1665 | Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction
1666 | Jacob Steinhardt*, Stanford University; Gregory Valiant, ; Moses Charikar, Stanford University
1667 |  
1668 | Fast and accurate spike sorting of high-channel count probes with KiloSort
1669 | Marius Pachitariu*, ; Nick Steinmetz, UCL; Shabnam Kadir, ; Matteo Carandini, UCL; Kenneth Harris, UCL
1670 |  
1671 | Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
1672 | Wouter M. Koolen*, ; Peter Grunwald, CWI; Tim Van Erven, 
1673 |  
1674 | Ancestral Causal Inference
1675 | Sara Magliacane*, VU University Amsterdam; Tom Claassen, ; Joris Mooij, Radboud University Nijmegen
1676 |  
1677 | More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning
1678 | Xinyang Yi, UT Austin; Zhaoran Wang, Princeton University; Zhuoran Yang , Princeton University; Constantine Caramanis, ; Han Liu*, 
1679 |  
1680 | Tagger: Deep Unsupervised Perceptual Grouping
1681 | Klaus Greff*, IDSIA; Antti Rasmus, The Curious AI Company; Mathias Berglund, The Curious AI Company; Tele Hao, The Curious AI Company; Harri Valpola, The Curious AI Company
1682 |  
1683 | Efficient Algorithm for Streaming Submodular Cover
1684 | Ashkan Norouzi-Fard*, EPFL; Abbas Bazzi, EPFL; Ilija Bogunovic, EPFL Lausanne; Marwa El Halabi, l; Ya-Ping Hsieh, ; Volkan Cevher, 
1685 |  
1686 | Interaction Networks for Learning about Objects, Relations and Physics
1687 | Peter Battaglia*, Google DeepMind; Razvan Pascanu, ; Matthew Lai, Google DeepMind; Danilo Jimenez Rezende, ; Koray Kavukcuoglu, Google DeepMind
1688 |  
1689 | Efficient state-space modularization for planning: theory, behavioral and neural signatures
1690 | Daniel McNamee*, University of Cambridge; Daniel Wolpert, University of Cambridge; Máté Lengyel, University of Cambridge
1691 |  
1692 | Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent
1693 | Chi Jin*, UC Berkeley; Sham Kakade, ; Praneeth Netrapalli, Microsoft Research
1694 |  
1695 | Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics
1696 | Wei-Shou Hsu*, University of Waterloo; Pascal Poupart, 
1697 |  
1698 | Computing and maximizing influence in linear threshold and triggering models
1699 | Justin Khim*, University of Pennsylvania; Varun Jog, ; Po-Ling Loh, Berkeley
1700 |  
1701 | Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions
1702 | Yichen Wang*, Georgia Tech; Nan Du, ; Rakshit Trivedi, Georgia Institute of Technolo; Le Song, 
1703 |  
1704 | Learning Deep Parsimonious Representations
1705 | Renjie Liao*, UofT; Alexander Schwing, ; Rich Zemel, ; Raquel Urtasun, 
1706 |  
1707 | Optimal Learning for Multi-pass Stochastic Gradient Methods
1708 | Junhong Lin*, Istituto Italiano di Tecnologia; Lorenzo Rosasco, 
1709 |  
1710 | Generative Adversarial Imitation Learning
1711 | Jonathan Ho*, Stanford; Stefano Ermon, 
1712 |  
1713 | An End-to-End Approach for Natural Language to IFTTT Program Translation
1714 | Chang Liu*, University of Maryland; Xinyun Chen, Shanghai Jiaotong University; Richard Shin, ; Mingcheng Chen, University of Illinois, Urbana-Champaign; Dawn Song, UC Berkeley
1715 |  
1716 | Dual Space Gradient Descent for Online Learning
1717 | Trung Le*, University of Pedagogy Ho Chi Minh city; Tu Nguyen, Deakin University; Vu Nguyen, Deakin University; Dinh Phung, Deakin University
1718 |  
1719 | Fast stochastic optimization on Riemannian manifolds
1720 | Hongyi Zhang*, MIT; Sashank Jakkam Reddi, Carnegie Mellon University; Suvrit Sra, MIT
1721 |  
1722 | Professor Forcing: A New Algorithm for Training Recurrent Networks
1723 | Alex Lamb, Montreal; Anirudh Goyal*, University of Montreal; ying Zhang, University of Montreal; Saizheng Zhang, University of Montreal; Aaron Courville, University of Montreal; Yoshua Bengio, U. Montreal
1724 |  
1725 | Learning brain regions via large-scale online structured sparse dictionary learning
1726 | Elvis DOHMATOB*, Inria; Arthur Mensch, inria; Gaël Varoquaux, ; Bertrand Thirion, 
1727 |  
1728 | Efficient Neural Codes under Metabolic Constraints
1729 | Zhuo Wang*, University of Pennsylvania; Xue-Xin Wei, University of Pennsylvania; Alan Stocker, ; Dan Lee , University of Pennsylvania
1730 |  
1731 | Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
1732 | Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University
1733 |  
1734 | Efficient High-Order Interaction-Aware Feature Selection Based on Conditional Mutual Information
1735 | Alexander Shishkin, Yandex; Anastasia Bezzubtseva, Yandex; Alexey Drutsa*, Yandex; Ilia Shishkov, Yandex; Ekaterina Gladkikh, Yandex; Gleb Gusev, Yandex LLC; Pavel Serdyukov, Yandex
1736 |  
1737 | Bayesian Intermittent Demand Forecasting for Large Inventories
1738 | Matthias Seeger*, Amazon; David Salinas, Amazon; Valentin Flunkert, Amazon
1739 |  
1740 | Visual Question Answering with Question Representation Update
1741 | RUIYU LI*, CUHK; Jiaya Jia, CUHK
1742 |  
1743 | Learning Parametric Sparse Models for Image Super-Resolution
1744 | Yongbo Li, Xidian University; Weisheng Dong*, Xidian University; GUANGMING Shi, Xidian University; Xuemei Xie, Xidian University; Xin Li, WVU
1745 |  
1746 | Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
1747 | Jean-Bastien Grill, Inria Lille - Nord Europe; Michal Valko*, Inria Lille - Nord Europe; Remi Munos, Google DeepMind
1748 |  
1749 | Asynchronous Parallel Greedy Coordinate Descent
1750 | Yang You, UC Berkeley; Xiangru Lian, University of Rochester; Cho-Jui Hsieh*, ; Ji Liu, ; Hsiang-Fu Yu, University of Texas at Austin; Inderjit Dhillon, ; James Demmel, UC Berkeley
1751 |  
1752 | Iterative Refinement of the Approximate Posterior for Directed Belief Networks
1753 | Rex Devon Hjelm*, University of New Mexico; Ruslan Salakhutdinov, University of Toronto; Kyunghyun Cho, University of Montreal; Nebojsa Jojic, Microsoft Research; Vince Calhoun, Mind Research Network; Junyoung Chung, University of Montreal
1754 |  
1755 | Assortment Optimization Under the Mallows model
1756 | Antoine Desir*, Columbia University; Vineet Goyal, ; Srikanth Jagabathula, ; Danny Segev, 
1757 |  
1758 | Disease Trajectory Maps
1759 | Peter Schulam*, Johns Hopkins University; Raman Arora, 
1760 |  
1761 | Multistage Campaigning in Social Networks
1762 | Mehrdad Farajtabar*, Georgia Tech; Xiaojing Ye, Georgia State University; Sahar Harati, Emory University; Le Song, ; Hongyuan Zha, Georgia Institute of Technology
1763 |  
1764 | Learning in Games: Robustness of Fast Convergence
1765 | Dylan Foster, Cornell University;  Zhiyuan Li, Tsinghua University; Thodoris Lykouris*, Cornell University; Karthik Sridharan, Cornell University; Eva Tardos, Cornell University
1766 |  
1767 | Improving Variational Autoencoders with Inverse Autoregressive Flow
1768 | Diederik Kingma*, ; Tim Salimans, 
1769 |  
1770 | Algorithms and matching lower bounds for approximately-convex optimization
1771 | Andrej Risteski*, Princeton University; Yuanzhi Li, Princeton University
1772 |  
1773 | Unified Methods for Exploiting Piecewise Structure in Convex Optimization
1774 | Tyler Johnson*, University of Washington; Carlos Guestrin, 
1775 |  
1776 | Kernel Bayesian Inference with Posterior Regularization
1777 | Yang Song*, Stanford University; Jun Zhu, ; Yong Ren, Tsinghua University
1778 |  
1779 | Neural universal discrete denoiser
1780 | Taesup Moon*, DGIST; Seonwoo Min, ; Byunghan Lee, ; Sungroh Yoon, 
1781 |  
1782 | Optimal Architectures in a Solvable Model of Deep Networks
1783 | Jonathan Kadmon*, Hebrew University; Haim Sompolinsky , 
1784 |  
1785 | Conditional Image Generation with Pixel CNN Decoders
1786 | Aaron Van den Oord*, Google Deepmind; Nal Kalchbrenner, ; Lasse Espeholt, ; Koray Kavukcuoglu, Google DeepMind; Oriol Vinyals, ; Alex Graves, 
1787 |  
1788 | Supervised Learning with Tensor Networks
1789 | Edwin Stoudenmire*, Univ of California Irvine; David Schwab, Northwestern University
1790 |  
1791 | Multi-step learning and underlying structure in statistical models
1792 | Maia Fraser*, University of Ottawa
1793 |  
1794 | Blind Optimal Recovery of Signals
1795 | Dmitry Ostrovsky*, Univ. Grenoble Alpes; Zaid Harchaoui, NYU, Courant Institute; Anatoli Juditsky, ; Arkadi Nemirovski, Gerogia Institute of Technology
1796 |  
1797 | An Architecture for Deep, Hierarchical Generative Models
1798 | Philip Bachman*, 
1799 |  
1800 | Feature selection for classification of functional data using recursive maxima hunting
1801 | José Torrecilla*, Universidad Autónoma de Madrid; Alberto Suarez, 
1802 |  
1803 | Achieving budget-optimality with adaptive schemes in crowdsourcing
1804 | Ashish Khetan, University of Illinois Urbana-; Sewoong Oh*, 
1805 |  
1806 | Near-Optimal Smoothing of Structured Conditional Probability Matrices
1807 | Moein Falahatgar, UCSD; Mesrob I. Ohannessian*, ; Alon Orlitsky, 
1808 |  
1809 | Supervised Word Mover's Distance
1810 | Gao Huang, ; Chuan Guo*, Cornell University; Matt Kusner, ; Yu Sun, ; Fei Sha, University of Southern California; Kilian Weinberger, 
1811 |  
1812 | Exploiting Tradeoffs for Exact Recovery in Heterogeneous Stochastic Block Models
1813 | Amin Jalali*, University of Washington; Qiyang Han, University of Washington; Ioana Dumitriu, University of Washington; Maryam Fazel, University of Washington
1814 |  
1815 | Full-Capacity Unitary Recurrent Neural Networks
1816 | Scott Wisdom*, University of Washington; Thomas Powers, ; John Hershey, ; Jonathan LeRoux, ; Les Atlas, 
1817 |  
1818 | Threshold Bandits, With and Without Censored Feedback
1819 | Jacob Abernethy, ; Kareem Amin, ; Ruihao Zhu*, Massachusetts Institute of Technology
1820 |  
1821 | Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
1822 | Wenjie Luo*, University of Toronto; Yujia Li, University of Toronto; Raquel Urtasun, ; Rich Zemel, 
1823 |  
1824 | Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
1825 | Lev Bogolubsky, ; Pavel Dvurechensky*, Weierstrass Institute for Appl; Alexander Gasnikov, ; Gleb Gusev, Yandex LLC; Yurii Nesterov, ; Andrey Raigorodskii, ; Aleksey Tikhonov, ; Maksim Zhukovskii, 
1826 |  
1827 | k^*-Nearest Neighbors: From Global to Local
1828 | Oren Anava, Technion; Kfir Levy*, Technion
1829 |  
1830 | Normalized Spectral Map Synchronization
1831 | Yanyao Shen*, UT Austin; Qixing Huang, Toyota Technological Institute at Chicago; Nathan Srebro, ; Sujay Sanghavi, 
1832 |  
1833 | Beyond Exchangeability: The Chinese Voting Process
1834 | Moontae Lee*, Cornell University; Seok Hyun Jin, Cornell University; David Mimno, Cornell University
1835 |  
1836 | A posteriori error bounds for joint matrix decomposition problems
1837 | Nicolo Colombo, Univ of Luxembourg; Nikos Vlassis*, Adobe Research
1838 |  
1839 | A Bayesian method for reducing bias in neural representational similarity analysis
1840 | Ming Bo Cai*, Princeton University; Nicolas Schuck, Princeton Neuroscience Institute, Princeton University; Jonathan Pillow, ; Yael Niv, 
1841 |  
1842 | Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
1843 | Chris Junchi Li, Princeton University; Zhaoran Wang*, Princeton University; Han Liu, 
1844 |  
1845 | Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities
1846 | Ruitong Huang*, University of Alberta; Tor Lattimore, ; András György, ; Csaba Szepesvari, U. Alberta
1847 |  
1848 | SDP Relaxation with Randomized Rounding for Energy Disaggregation
1849 | Kiarash Shaloudegi, ; András György*, ; Csaba Szepesvari, U. Alberta; Wilsun Xu, University of Alberta
1850 |  
1851 | Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates
1852 | Yuanzhi Li, Princeton University; Yingyu Liang*, ; Andrej Risteski, Princeton University
1853 |  
1854 | Unsupervised Learning of 3D Structure from Images
1855 | Danilo Jimenez Rezende*, ; S. M. Ali Eslami, Google DeepMind; Shakir Mohamed, Google DeepMind; Peter Battaglia, Google DeepMind; Max Jaderberg, ; Nicolas Heess, 
1856 |  
1857 | Poisson-Gamma dynamical systems
1858 | Aaron Schein*, UMass Amherst; Hanna Wallach, Microsoft Research New England; Mingyuan Zhou, 
1859 |  
1860 | Gaussian Processes for Survival Analysis
1861 | Tamara Fernandez, Oxford; Nicolas Rivera*, King's College London; Yee-Whye Teh, 
1862 |  
1863 | Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain
1864 | Ian En-Hsu Yen*, University of Texas at Austin; huang Xiangru, University of Texas at Austin; Kai Zhong, University of Texas at Austin; Zhang Ruohan, University of Texas at Austin; Pradeep Ravikumar, ; Inderjit Dhillon, 
1865 |  
1866 | Optimal Binary Classifier Aggregation for General Losses
1867 | Akshay Balsubramani*, UC San Diego; Yoav Freund, 
1868 |  
1869 | Disentangling factors of variation in deep representation using adversarial training
1870 | Michael Mathieu, NYU; Junbo Zhao, NYU; Aditya Ramesh, NYU; Pablo Sprechmann*, ; Yann LeCun, NYU
1871 |  
1872 | A primal-dual method for constrained consensus optimization
1873 | Necdet Aybat*, Penn State University; Erfan Yazdandoost Hamedani, Penn State University
1874 |  
1875 | Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing
1876 | Farshad Lahouti *, Caltech ; Babak Hassibi, Caltech
1877 |  
1878 | 


--------------------------------------------------------------------------------