Abstracts of Papers




Hardness Amplification within NP against Deterministic Algorithms.

Parikshit Gopalan, Venkatesan Guruswami.

We study the average-case hardness of the class NP against deterministic polynomial time algorithms. We prove that there exists some constant $mu > 0$ such that if there is some language in NP for which no deterministic polynomial time algorithm can decide L correctly on a $1- (log n)^{-mu}$ fraction of inputs of length $n$, then there is a language L' in NP for which no deterministic polynomial time algorithm can decide $L'$ correctly on a $3/4 + (log n)^{-mu}$ fraction of inputs of length $n$. In coding theoretic terms, we give a construction of a monotone code that can be uniquely decoded up to error rate $1/4$ by a deterministic local decoder.


List-Decoding Reed-Muller Codes over small fields.

Parikshit Gopalan, Adam R. Klivans, David Zuckerman.

We present the first local list-decoding algorithm for the $r^{th}$ order Reed-Muller code RM(r,m) over GF[2] for $r \geq 2$. Given an oracle for a received word $R: GF[2]^m \rightarrow GF[2]$, our randomized local list-decoding algorithm produces a list containing all degree $r$ polynomials within relative distance $(2^{-r} - \eps)$ from $R$ for any $\eps > 0$ in time $\poly(m^r,\eps^{-r})$. The list size could be exponential in $m$ at radius $2^{-r}$, so our bound is optimal in the local setting. Since RM(r,m) has relative distance $2^{-r}$, such codes are list-decodable beyond the Johnson bound for $r \geq 2$.

In the setting where we are allowed running-time polynomial in the block-length, we show that list-decoding is possible up to even larger radii, beyond the minimum distance. We give a deterministic list-decoder that works at error rate below $J(2^{1-r})$, where $J(\delta)$ denotes the Johnson radius for minimum distance $\delta$. This shows that RM(2,m) codes are list-decodable up to radius $\eta$ for any constant $\eta < 1/2$ in time polynomial in the block-length.

Over small fields GF[q], we present list-decoding algorithms in both the global and local settings that work up to the list-decoding radius. We conjecture that the list-decoding radius approaches the minimum distance (like over GF[2]), and prove that this holds true when the degree is divisible by $q-1$.

Agnostically Learning Decision Trees.

Parikshit Gopalan, Adam Tauman Kalai, Adam R. Klivans.

We give a query algorithm for agnostically learning decision trees with respect to the uniform distribution on inputs. Given black-box access to an arbitrary binary function $f$ on the $n$-dimensional hypercube, our algorithm finds a function that agrees with $f$ on almost (within an $\eps$ fraction) as many inputs as the best size-$t$ decision tree, in time $poly(n,t,1/\eps)$. This is the first polynomial-time algorithm for learning decision trees in a harsh noise model. We also give a proper agnostic learning algorithm for juntas, a sub-class of decision trees, again using membership queries.

Conceptually, the present paper parallels recent work towards agnostic learning of halfspaces; algorithmically, it is significantly more challenging. The core of our learning algorithm is a procedure to implicitly solve a convex optimization problem over the $L_1$ ball in $2^n$ dimensions using an approximate gradient projection method.


Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence.

Anna Gal, Parikshit Gopalan.

We show that any deterministic data-stream algorithm that makes a constant number of passes over the input and gives a constant factor approximation of the length of the longest increasing subsequence in a sequence of length $n$ must use space $\Omega(\sqrt{n})$. This proves a conjecture made by Gopalan, Jayram, Krauthgamer and Kumar who proved a matching upper bound. Our results yield asymptotically tight lower bounds for all approximation factors, thus resolving the main open problem from their paper. Our proof is based on analyzing a related communication problem and proving a direct sum type property for it.


Hardness of Reconstructing Multivariate Polynomials over Finite Fields.

Parikshit Gopalan, Subhash Khot, Rishi Saket.

We study the polynomial reconstruction problem for low-degree multivariate polynomials over finite fields. In the GF[2] version of this problem, we are given a set of points on the hypercube and target values $f(x)$ for each of these points, with the promise that there is a polynomial over GF[2] of degree at most $d$ that agrees with $f$ at $1 -eps$ fraction of the points. Our goal is to find a degree $d$ polynomial that has good agreement with $f$. We show that it is NP-hard to find a polynomial that agrees with $f$ on more than $1 -2^{-d}$ fraction of the points. This holds even with the stronger promise that the polynomial that fits the data is in fact linear, whereas the algorithm is allowed to find a polynomial of degree $d$. Previously the only known hardness of approximation (or even NP-completeness) was for the case when $d =1$, which follows from a celebrated result of Hastad.

In the setting of Computational Learning, our result shows the hardness of (non-proper)agnostic learning of parities, where the learner is allowed a low-degree polynomial as a hypothesis. This is the first non-proper hardness result for this central problem in computational learning. Our result suggests limitations to learning circuit classes via polynomial reconstruction. Our results can be extended to multivariate polynomial reconstruction over any finite field.


Estimating the Sortedness of a Data Stream.

Parikshit Gopalan, T. S. Jayram, Ravi Kumar, Robert Krauthgamer.

The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one-pass data stream model and present the first sub-linear space algorithms for both problems.

We first present O(sqrt{n})-space deterministic algorithms that approximate the distance to monotonicity and the LIS to within a factor that is arbitrarily close to 1. We also show a linear lower bound on the space required by any randomized algorithm to compute the LIS (or alternatively the distance from monotonicity) exactly, demonstrating that approximation is necessary for sub-linear space computation; this bound improves upon the existing lower bound of \sqrt{n} [LNVZ06].

Our main result is a randomized algorithm that uses only O(log^2 n) space and approximates the distance to monotonicity to within a factor that is arbitrarily close to 4. In contrast, we believe that any significant reduction in the space complexity for approximating the length of the LIS is considerably hard. We conjecture that any deterministic (1 + eps) approximation algorithm for LIS requires sqrt{n} space. We prove the conjecture for a restricted yet natural class of deterministic algorithms.


New Results for Learning Noisy Parities and Halfspaces

Vitaly Feldman, Parikshit Gopalan, Subhash Khot and Ashok K. Ponnuswami

We address well-studied problems concerning the learnability of parities and halfspaces in the presence of noise.

Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learning. We reduce a number of basic problems regarding learning under the uniform distribution to learning of noisy parities, thus highlighting the central role of this problem for learning under the uniform distribution. We show that under the uniform distribution, learning parities with adversarial classification noise reduces to learning parities with random classification noise. Together with the parity learning algorithm of Blum et al., this gives the first nontrivial algorithm for learning parities with adversarial noise. We show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables. We show that learning of k-juntas reduces to learning noisy parities of k variables. These reductions work even in the presence of random classification noise in the original DNF or junta.

We consider the problem of finding a halfspace that maximizes the agreement rate with a given set of examples over R^n. The problem is motivated by learning of halfspaces in the presence of adversarial noise and by learning of halfspaces in the agnostic framework of Haussler. We show that even if there is a halfspace that correctly classifies 1 - c fraction of the given examples, it is hard to find a halfspace that is correct on a 1/2 + c fraction for any c > 0 assuming P \neq NP. This gives an essentially optimal inapproximability factor of 2- c, improving the factor of 85/84 due to Bshouty and Burroughs. Finally, we prove that majorities of halfspaces are hard to PAC-learn using any representation, based on the cryptographic assumption underlying the security of the Ajtai-Dwork cryptosystem. We show that this result implies that learning halfspaces with high levels of adversarial noise is hard, independent of the representation of the hypothesis.


The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies.

Parikshit Gopalan, Phokion Kolaitis, Elitza Maneva, Christos Papadimitriou.

Given a Boolean formula, do its solutions form a connected subgraph of the hypercube? Considerations of solution-space connectivity underlie several recent algorithmic ideas regarding the satisfiability problem and Bayesian nets. We obtain a complete classification of the complexity of the connectivity problem for Boolean formulas in Schaefer's framework. This classification has the form of a trichotomy theorem: the connectivity problem is polynomial precisely when the satisfiability problem is; it is coNP-complete for a further broad class of formulas, and PSPACE-complete for all remaining cases. For the endpoint-specific connectivity problem, we establish a dichotomy theorem: the polynomial and coNP-complete cases of the connectivity problem are now polynomial, while all remaining cases are PSPACE-complete. Furthermore, in the PSPACE-complete cases, the diameter of the connected components can be exponential, but in all other cases it is linear. Thus, small diameter and tractability of the endpoint-specific connectivity problem are remarkably aligned.


Constructing Ramsey graphs from Boolean function representations.

Parikshit Gopalan.

Explicit construction of Ramsey graphs or graphs with no large clique or independent set has remained a challenging open problem for a long time. While Erdos's probabilistic argument shows the existence of graphs on 2n vertices with no clique or independent set of size 2n, the best explicit constructions with 2n vertices achieve only a bound of c\sqrt{n log n}. Constructing Ramsey graphs is closely related to polynomial representations of Boolean functions; Grolumusz showed that a low degree representation for OR function can be used to explicitly construct Ramsey graphs.

We generalize the above relation by proposing a new framework. We propose a new definition of OR representations: a pair of polynomials represent the OR function on if the union of their zero sets contains all points in {0, 1}n except the origin. We give a simple construction of a Ramsey graph using such polynomials. Furthermore, we show that all the best known constructions, ones to due to Frankl-Wilson, Grolmusz and Alon are captured by this framework; they can all be derived from various OR representations of degree \sqrt{n} based on symmetric polynomials. Thus the barrier to better Ramsey constructions through current techniques appears to be the construction of lower degree representations. Using new algebraic techniques, we show that the above Ramsey constructions cannot be improved using symmetric polynomials.


Algorithms for Modular Counting of Roots of Multivariate Polynomials

Parikshit Gopalan, Venkatesan Guruswami and Richard J. Lipton.

Given a multivariate polynomial P(X1, ..., Xn) over a finite field Fq, let N(P) denote the number of roots over Fqn. The modular root counting problem is given a modulus r, to determine N_r(P) = N(P) mod r. We study the complexity of computing N_r(P), when the polynomial is given as a sum of monomials. We give an efficient algorithm to compute N_r(P) when the modulus r is a power of the characteristic of the field. We show that for all other moduli, the problem of computing N_r(P) is NP-hard. We present some hardness results which imply that that our algorithm is essentially optimal for prime fields. We show an equivalence between maximum-likelihood decoding for Reed-Solomon codes and a root-finding problem for symmetric polynomials.


Query-Efficient Algorithms for Polynomial Interpolation over Composites.

Parikshit Gopalan.

The problem of polynomial interpolation is to reconstruct a polynomial based on its valuations on a set of inputs I. We consider the problem over Z_m when m is composite. We ask the question: Given I \subseteq Z_m, how many evaluations of a polynomial at points in I are required to compute its value at every point in I?. Surprisingly for composite m, this number can vary exponentially between log |I| and |I| in contrast to the prime case where |I| evaluations are necessary. While this minimization problem is NP-complete, we give an efficient algorithm of query complexity within a factor t of the optimum where t is the number of prime factors of m. We use our interpolation algorithm to design algorithms for zero-testing and distributional learning of polynomials over Z_m. In some cases, we get an exponential improvement over known algorithms in query complexity and running time.

Our main technical contribution is the notion of an interpolating set for I which is a subset S of I such that a polynomial which is 0 over S must be 0 at every point in I. Any interpolation algorithm needs to query an interpolating set for I. Our query-efficient algorithms are obtained by constructing interpolating sets whose size is close to optimal.


Polynomials that Sign Represent Parity and Descartes Rule of Signs

Saugata Basu, Nayantara Bhatnagar, Parikshit Gopalan, Richard J.Lipton.

We study the sparsity of real polynomials that sign represent parity on n variables, each of which takes values from some finite subset A of integers. While the degree of such polynomials has been well studied, relatively little is known about their sparsity. We show that sign representing parity over {0,1 ..., m-1}n with the degree in each variable at most m-1 requires sparsity at least mn. We show a bound of (m-1)n for weak representations. We show that a tradeoff exists between sparsity and degree, and that in some cases, the difference in sparsities is exponential. We show a lower bound of n(m -2) + 1 on the sparsity of polynomials of any degree representing parity over {0, 1,..., m-1}n. We prove exact bounds on the sparsity of such polynomials for any two element subset A. We show that for depth-two AND-OR-NOT circuits with a Threshold Gate at the top, the minimum circuit size for a function f equals the minimum sparsity of a polynomial sign representing f over a certain basis. We use this to give a lower bound of (3/2)n for Parity, and a tight lower bound of 2n for Inner Product, improving on previous bounds.


The Degree of Threshold mod 6 and Diophantine Equations

Nayantara Bhatnagar, Parikshit Gopalan, Richard J.Lipton

We continue the study of the degree of polynomials representing threshold functions modulo 6 initiated by Barrington, Beigel and Rudich. We use the framework established by the authors relating representations by symmetric polynomials to simultaneous protocols [BGL03]. We show that proving bounds on the degree of Threshold functions is equivalent to counting the number of solutions to certain Diophantine equations. Using this connection, we prove nearly tight upper and lower bounds on the degree of Threshold-k function (T_k) for all k.

When k is a fixed constant, we show that the degree of the T_k is O(n0.5 + µ )for any µ > 0. The proof uses a result of Filaseta on factors of numbers of the form N(N + d). We show an upper bound of O(n1/t+µ ) when m has t distinct prime factors using a theorem due to Granville, which generalizes Filaseta's result but which assumes the abc-conjecture. We show that for t=2, the abc-conjecture implies that the degree of T_k is O((nk)0.5 + µ). We show a lower bound of \Omega(n1/tk(t-1)/t) for strong representations of T_k over Z_m. This improves the previous bound of \Omega(max(k, n1/t)). When t =2, it nearly matches the upper bound of O((nk)0.5 + µ). Further, when t=2, we also show a similar weak lower bound. These lower bounds are proved by constructing solutions to the equations in question using a pigeonhole argument.


Symmetric Polynomials over Zm and Simultaneous Communication Protocols.

Nayantara Bhatnagar, Parikshit Gopalan, Richard J.Lipton.

We study the problem of representing symmetric Boolean functions as symmetric polynomials over composites. We show an equivalence between such representations and simultaneous communication protocols. Computing a Boolean function f with a polynomial of degree d modulo pq is equivalent to a two player simultaneous protocol for computing f where one player is given the first log d digits of the weight in base p and the other is given the first log d digits of the weight in base q. This reduces the problem of proving bounds on the degree of symmetric polynomials to proving bounds on simultaneous communication protocols.

We use this equivalence to show linear lower bounds on the degree of symmetric polynomials weakly representing classes of Mod and Threshold functions. Previously, the best known lower bound for symmetric polynomials weakly representing any  functions. Previously function mod 6 was \sqrt(n). We show there exist symmetric polynomials of degree o(n) representing Threshold-c for c constant, using the fact that the number of solutions of certain exponential Diophantine equations are finite. Conversely, the fact that the degree is o(n) implies that some classes of Diophantine equations can have only finitely many solutions. Our results give simplifications of many previously known results and show that polynomial representations are closely related to certain questions in number theory.


Randomized Time-Space Tradeoffs for Directed Graph Connectivity.

Parikshit Gopalan, Richard J. Lipton, Aranyak Mehta.

We present a spectrum of time space tradeoffs for solving directed graph connectivity in small space. We use a startegy parametrized by a number k that uses k pebbles to perform several small random walks. Our approach allows us to look at Savitch's algorithm and the random walk algorithm as two extremes of the same basic divide and conquer strategy. It also implies the tightness of some lower bounds for restrcited computational models for a large range of parameters.


Error-Correction against Computationally Bounded Adversaries.

Yan Ding, Parikshit Gopalan, Richard J. Lipton.

We initiate a study of error correcting codes against computationally bounded adversaries. We introduce a generic method called code scrambling that uses cryptographic techniques to improve the performance of error correcting codes in the private-key model, where the noisy channel between the sender and the receiver is viewed as an adversary, and the sender and receiver share a private key for the scrambling and unscrambling of an encoded message. Code scrambling efficiently achieves equivalence between the worst-case performance of the scrambled code against a computationally bounded adversary, and the average-case performance of the original code against a q-ary symmetric channel.

Applying code scrambling to known explicit binary codes which achieves capacity against a binary symmetric channel, we obtain explicit capacity-achieving binary codes against computationally bounded adversaries. We also apply code scrambling to the Reed-Solomon codes and show that the resulting scrambled codes can decode uniquely beyond the classical half-the-distance error correction bound with high probability.


Caching with Expiration Times for Internet Applications

Parikshit Gopalan, Howard Karloff, Milena Mihail, Aranyak Mehta, Nisheeth Vishnoi.

Caching data together with expiration times beyond which the data is no longer valid is a standard method for promoting information coherence in distributed systems, including the Internet, the World Wide Web and Peer-to-Peer networks. We use the framework of competitive analysis of online algorithms, and study upper and lower bounds for page eviction strategies in the case where data has expiration times. We show that minimal adaptations of marking algorithms achieve performance similar to the well studied case of caching without the expiration time constraint. Marking algorithms include the widely used Least Recently Used (LRU) eviction policy. In practice, when data have expiration times, the LRU eviction policy is used widely, often without any consideration of expiration times. Our results explain and justify this standard practice.