Santa Fe, New Mexico, USA | August 31-September 4, 2009

We propose a new distributed algorithm for sparse variants of the network alignment problem that occurs in a variety of data mining areas including systems biology, database matching, and computer vision. For this problem, the goal is to find an approximate isomorphism between two large labeled graphs with over 200,000 vertices. Our alignment objective is an edge-preserving mapping with high similarity between labels of mapped vertices.

Our algorithm uses a belief propagation heuristic and provides near optimal solutions for an NP-hard combinatorial optimization problem. We show that our algorithm is faster and outperforms or nearly ties existing algorithms on synthetic problems, a problem in bioinformatics, and a problem in ontology matching. We also provide a framework for analyzing these algorithms.

Joint with M. Gerritsen, D. Gleich, J. H. Kim, A. Saberi, and Y. Wang.

slides (pdf)A spin flipping process that underlies the performance of the random edge simplex algorithm will be discussed. In this stochastic process, which takes place on a one-dimensional lattice whose sites may be either occupied or vacant, occupied sites become vacant at a constant rate and simultaneously cause all sites to the right to change their state. This random process exhibits rich phenomenology. First, there is a front, defined by the position of the leftmost occupied site, that propagates at a nontrivial velocity. Second, the front involves a depletion zone with an excess of vacant sites. The total excess Δ increases logarithmically, Δ ~ ln k, with the distance k from the front. Third, the front exhibits aging: young fronts are vigorous but old fronts are sluggish. We investigate these phenomena using a quasi-static approximation, direct solutions of small systems, and numerical simulations.

slides (pdf)After a description of the Extremal Optimization heuristic (EO), some of its dynamics aspects are discussed. In its local search of configuration space, EO exhibits power-law distributed return times for the updates of individual variables and a stretched-exponential decay in the auto-correlations, attesting to a thorough yet rapidly progressing exploration. Finally, many new EO-results for spin glasses and graph bi-partitioning are presented.

slides (pdf)Recent research in applied mathematics has shown that images and other signals can be reconstructed from many fewer data than previously thought possible. In this newly-emerged field of compressive sensing, the inherent compressibility (or sparsity) of real-world signals is exploited to allow reconstruction by means of convex optimization. In this talk, I will show that using a nonconvex approach instead has many advantages. Chief among these is the ability to reconstruct from still fewer data than the usual compressive sensing methods. Others include increased robustness to noise and signal nonsparsity, making the methods more widely applicable. By being able to reconstruct images and other signals from fewer data, we can reduce data collection time and cost, and ease the burden of data transmission or storage.

The apparent challenge of solving a nonconvex optimization problem is the enormous number of local minima to be avoided. Surprisingly, simple and efficient algorithms are able in practice to find the global minimum reliably. I will present a particular algorithmic framework that is especially fast, and scales well to large problems. It is best applied in cases of Fourier-domain measurements, as in the case of MRI, synthetic-aperture imaging (radar, sonar, or lidar), and others. The result is that we get the improved reconstruction performance of nonconvex compressive sensing, while maintaining the speed and reliability of convex optimization.

slides (pdf)We describe a rich family of binary variables statistical mechanics models on planar graphs which are equivalent to Gaussian Grassmann Graphical models (free fermions). Calculation of partition function (weighted counting) in the models is easy (of polynomial complexity) as reduced to eval- uation of determinants of matrixes linear in the number of variables. In particular, this family of models covers Holographic Algorithms of Valiant [1–3] and extends on the Gauge Transformations discussed in our previous works [4–8]. The planar part is based on [9].

We further extend our approach to the general case of surface graphs and demonstrate that, similar to the case of the dimer model considered in [10, 11], the partition function is given by an alternating sum of 2^{2g} determinants that correspond to 2^{2g} spinor structures on the embedding Riemann surface of genus g. This is achieved by considering the Z_2-self-intersection invariant of immersions, and relating the spinor structures to the equivalence classes of Kasteleyn orientations on the so-called extended graph, associated with the original graph.

abstract (pdf)slides (pdf)

In this talk, we present an overview of methods to identify and enumerate low weight pseudo-codewords for the linear programming (LP) decoding of low-density parity-check (LDPC) codes. We describe the instanton search algorithm and demonstrate how the resulting instanton statistics can be used for predicting the frame error rate (FER) performance of a given code. We elaborate on the relationship between failures of iterative decoders and the LP decoder with emphasis on the binary symmetric channel (BSC). We then present combinatorial characterization of trapping sets for the Gallager A algorithm and remark how such considerations can be useful in guiding the discovery of low weight pseudo-codewords.

Joint work with Michael Chertkov, Mikhail Stepanov and Bane Vasic.

slides (pdf)The graph isomorphism (GI) problem plays a central role in the theory of computational complexity and has importance in physics and chemistry as well. While no general efficient algorithm for solving GI is known, it is unlikely to be NP-complete; in this regard it is similar to the factoring problem, for which Shor has developed an efficient quantum algorithm.

In this talk I will discuss our investigations of quantum particles walking on graphs and their implications for possible algorithms for GI. We find that single-particle quantum random walks fail to distinguish some nonequivalent graphs that can be distinguished by random walks with two interacting particles. The implications of these observations for classical and quantum algorithms for GI will be discussed.

slides (ppt)Large-scale distributed inference problems require simple localized message-passing algorithms. For the simple problem of computing a linear function, gossip algorithms have recently received significant attention. While standard nearest-neighbor gossiping is diffusive in nature and requires large communication for convergence, faster algorithms can be developed exploiting lifting ideas. I will survey this area and present the analysis techniques based on mixing times of Markov chains and some recent open problems.

slides (ppt)Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications.

Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization.

We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models.

Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.

Joint work with Arian Maleki (EE, Stanford) and Andrea Montanari (Statistics and EE, Stanford).

slides (pdf)Demographic noise has profound effects on evolutionary and population dynamics, as well as on chemical reaction systems and models of epidemiology. Such noise is intrinsic and due to the discreteness of the dynamics in finite populations. We here show that similar noise-sustained trajectories arise in game dynamical learning, where the stochasticity has a different origin: agents sample a finite number of moves of their opponents inbetween adaptation events. The limit of infinite batches results in deterministic modified replicator equations, whereas finite sampling leads to a stochastic dynamics. The characteristics of these fluctuations can be computed analytically using methods from statistical physics, and such noise can affect the attractors significantly, leading to noise-sustained cycling or removing periodic orbits of the standard replicator dynamics.

We establish the existence of free energy limits for several diluted spin glass models corresponding to certain combinatorial models on Erdos-Renyi G(N,c/N). For a variety of models, including independent sets, MAX-CUT, coloring and K-SAT, we prove that the free energy both at a positive and zero temperature, appropriately rescaled, converges to a limit as the size of the underlying graph diverges to infinity. In the zero temperature case, this is interpreted as the existence of the scaling limit for the corresponding combinatorial optimization problem. For example, as a special case we prove that the size of a largest independent set in these graphs, normalized by the number of nodes converges to a limit w.h.p. thus resolving an open problem.

Our approach is based on developing a combinatorial approach to Guerra's interpolation method. Among other things, the Guerra's interpolation method was used to prove the existence of free energy limits for Viana-Bray and random K-SAT models. Our simpler combinatorial approach allows us to work with the zero temperature case (optimization) directly and extend the approach to many other models. Additionally, using this approach we establish the large deviations principle for the satisfiability property for constraint satisfaction problems such as coloring, K-SAT and NAE-K-SAT.

Joint work with Mohsen Bayati (Stanford) and Prasad Tetali (Georgia Tech).

slides (ppt)Finding the most likely assignment in graphical models is a key problem in many applications. Although a hard problem, it can be approximated using LP relaxations, often resulting in good empirical performance. In many cases one is interested not only in the most likely assignment, but in the k assignments with maximum probability. Standard LP approximations cannot be used to solve this problem. Here I will introduce an LP formulation of the k-best problem, defining the relevant polytopes, and providing their exact characterization. For the k=2 case, I will show that the standard LP approximation differs from the k-best one by only one constraint for tree structured graphs. I will demonstrate the empirical performance of the method, showing that if finds exact k-best solutions for complex models.

slides (pdf)We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006) allows to express the exact partition function Z of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze in detail both the loop series and the Pfaffian series for models with binary variables and pairwise interactions, and show that the first term of the Pfaffian series can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.

slides (pdf)abstract (pdf)

slides (pdf)

Everyone knows that the eigenvalue gap for the transition matrix of a Markov chain controls its convergence rate. And the second eigenvector, which dies off most slowly, can be very useful (if one knows it) for proving lower bounds on convergence rate.

What is less well known is that, for some interesting special classes of Markov chains, there are other, simpler, matrices whose spectra can also tell us a lot about the convergence rate. In particular, for many Glauber dynamics (single-site update dynamics for a spin system) on an underlying graph G, one can learn a lot by knowing the maximum eigenvalue of the adjacency matrix of G. More generally, there is an "influence matrix," which is often easy to compute explicitly, whose maximum eigenvalue can give even better bounds on the convergence rate. This is all intimately related to the dual "sufficient conditions for uniqueness of Gibbs measure" due to Dobrushin and Dobrushin&Shlosman from the 1970's, and sheds a little more light on the close connection between them.

As one application, we will describe a very short proof that the Glauber dynamics for proper colorings of planar graphs converges fast when the number of colors is (1+o(1)) times the maximum degree of the graph.

We study a thermodynamic glass transition in a class of statistical-mechanical lattice models by using Monte Carlo simulations. Some lattice models on a random graph exhibit two transitions, dynamical and thermodynamic glass ones, as temperature and/or chemical potential are varied. At the dynamical transition, phase space is split into extremely large number of subsets and equilibration is not achieved for large systems even in an slow annealing limit. This is also often observed in constraint-satisfaction problems. Then, it is quite difficult to access the glass transition beyond the dynamical one by Monte Carlo simulations based on local updating. In order to reveal equilibrium properties behind the dynamical transition, we use an exchange Monte Carlo method. Some numerical evidences for a glass transition are found in a two-dimensional lattice gas model. We also discuss commutativity between an slow annealing limit and a large-size limit in a standard annealing MC and exchange MC.

slides (pdf)There are a lot of approximation techniques in statistical physics. Mean field approximations are some of such methods. In this talk, we review some of the mean field methods from information geometrical viewpoint. Especially, we focus on the Bethe approximation. The Bethe approximation is known as the (loopy) belief propagation in machine learning. We investigate the convergence and approximation accuracy from information geometrical viewpoint.

slides (pdf)We present a new view of Gaussian belief propagation (GaBP) based on a representation of the determinant as a product over orbits of a graph. We show that the GaBP determinant estimate captures totally-backtracking orbits of the graph and consider how to correct this estimate. We show that the missing orbits may be grouped into equivalence classes corresponding to backtrackless orbits and the contribution of each equivalence class is easily determined from the GaBP solution. Furthermore, we demonstrate that this multiplicative correction factor can be interpreted as the determinant of a backtrackless adjacency matrix of the graph with edge weights based on GaBP. Finally, an efficient method is proposed to compute a truncated correction factor including all backtrackless orbits up to a specified length.

Joint Work with Vladimir Chernyak and Michael Chertkov.

slides (pdf)We propose a methaheuristic method that addresses the problem of resource allocation for a future use, which optimizes the mean expected costs under combination of possible uncertainty factors. We assume that costs are linearly associated with resource storage, use, and penalties for the shortage of resources. Uncertainties associated with the events that cause the need in resources and possible reduction of resources are presented in a form of distribution functions. We consider the problem in a simple form when probability characteristics are represented by given discrete distributions and possible reduction of the quantity of a particular item is proportional to the quantity stored. Also, we have assumed no losses in the transportation process. The basic algorithm consists of three major parts: scenario generation, network flow cost optimization, and non-linear optimization that includes repeated solution of minimum cost maximum network flow problem as its function evaluation. The simulated annealing non-linear optimization method gave the best result in terms of accuracy and processing time. The data structures created in the preprocessing step allow for the efficient reuse of consecutively found solutions of the network flow problem. Algorithms based on minimum cost approximation and Monte-Carlo approach are discussed as well. The entire problem has also been reformulated in the linear programming terms which enable us to obtain an exact solution of small size problems. Comparison of the exact and approximate solutions shows that the non-linear approach and approximation algorithms come within 5% accuracy in the most test cases and significantly improve on processing time for the larger problems. Preliminary results of this research are very promising.

slides (ppt)Low-rank models are frequently used in machine learning and statistics. An important domain of application is provided by collaborative filtering where one of the problems is to reconstruct a low rank matrix from a sparse subset of entries. Given M, an nα × n matrix of rank r << n, assume that a uniformly random subset E of its entries are observed. We prove that a simple spectral algorithm reconstructs M from |E| = O(r n) observed entries with arbitrarily small relative root mean square error. Further, we describe an efficient algorithm that reconstructs M exactly from |E| = O(n log n) entries (for bounded r) under appropriate technical conditions. We then present results to show robustness of the algorithm when the entries observed are corrupted by additive noise. In this context, we prove an upper bound on the root mean square error that is order-optimal in a number of circumstances.

Joint work with Andrea Montanari and Sewoong Oh.

abstract (pdf)slides (pdf)

In this talk, I will present a new powerful Markov-chain algorithm for classical hard spheres, the "Event-chain". This algorithm is rejection-free, non-local, and its most powerful version breaks detailed balance. I'll then discuss the common physical phenomenon in two-dimensional classical hard-sphere liquids and quantum gases, namely the Kosterlitz-Thouless transition.

slides (pdf)We show that an important and hard-to-compute solutionspace feature of a Constraint Satisfaction Problem (CSP), namely the number of clusters of solutions, can be accurately estimated by a technique very similar counting the number of solutions. This cluster counting approach can be naturally written in terms of a factor graph, and using a variant of the Belief Propagation inference framework, we can accurately approximate cluster counts in random CSP problems. We illustrate the algorithm on large random satisfiability and graph coloring instances. Moreover, we supply a methodology to calculate number of clusters exactly using advanced techniques from knowledge compilation domain, which work very well also for instances coming from industrial domains.

slides (pdf)For a large class of large random problems in statistical physics (eg: mean field spin glasses) and optimization (eg: random coloring or random XORSAT), we show that one can generate easily large instances of the problem TOGETHER with a representative configuration (ie: an equilibrated set of spins or a typical solution of the optimization problem). This has many possible applications and we shall discuss two of them : 1) One can generate instances of very hard problems together with a solution which will be impossible to find without a priori knowing it: this can serve as a generator of hard benchmark instances as well as a cryptographic device. 2) One can also generate large equilibrated instance of some difficult models (which could not be possibility thermalized by any algorithm know) and then use thel as a starting point to simulate their equilibrium dynamics, or the effect of quantum perturbation : this turns out to be very useful to confirm the validity of some theory of complex mean field models in optimization, glass and spin glass physics.

Ref: Krzakala and Zdeborova, Phys. Rev. Lett. 102, 238701 (2009) and http://arxiv.org/abs/0902.4185

slides (pdf)I intend to present some examples of extensively studied problems of optimization where the techniques from enumeration and connections with statistical physics have lead recently to new insights.

slides (pdf)Though work at the interface between physics and computational complexity has centered largely on NP-hard problems, there are also many interesting questions to be addressed within the complexity class P of tractable problems. These questions are brought into relief by considering problems from the point of parallel computation. The talk will begin with a review of parallel computational complexity: the PRAM and Boolean circuit family models of computation, the classes P and NC and the notion of P-completeness. I will then discuss several physical models from this perspective. The last part of the talk will be about recent work with Cris Moore and Stephan Mertens on phase transitions in the circuit value problem.

slides (pdf)Compressed sensing allows perfect recovery of sparse signals (or signals sparse in some basis) using only a small number of random measurements. Existing results in compressed sensing literature have focused on characterizing the achievable performance by bounding the number of samples required for a given level of signal sparsity. However, using these bounds to minimize the number of samples requires a-priori knowledge of the sparsity of the unknown signal, or the decay structure for near-sparse signals. Furthermore, there are some popular recovery methods for which no such bounds are known.

In this talk, we describe an alternative scenario where observations are available in sequence. For any recovery method, this means that there is now a sequence of candidate reconstructions. We propose a method to estimate the reconstruction error directly from the samples themselves, for every candidate in this sequence. This estimate is universal in the sense that it is based only on the measurement ensemble, and not on the recovery method or any assumed level of sparsity of the unknown signal. With these estimates, one can now stop observations as soon as there is reasonable certainty of either exact or sufficiently accurate reconstruction. They also provide a way to obtain "run-time" guarantees for recovery methods that otherwise lack a-priori performance bounds.

We investigate both continuous (e.g. Gaussian) and discrete (e.g. Bernoulli or Fourier) random measurement ensembles, both for exactly sparse and general near-sparse signals, and with both noisy and noiseless measurements.

slides (pdf)The talk focuses on several aspects of the interplay between statistical mechanics and in- formation measures, like the relative entropy and the mutual information, which are among the most fundamental notions in information theory. After a brief background, the talk will be divided into two parts: In the first, it will be shown that the information inequality and the data processing theorem of the mutual information, which set the stage to most, if not all, fundamen- tal limits of information theory, are intimately related to the second law of thermodynamics – one of the fundamental limits of Nature. The second part of the talk will be devoted to the use of statistical–mechanical tools for analyzing the mutual information in coded communication systems. In particular, threshold effects and irregularities in the behavior of these systems will be shown to have intimate relations to phase transitions in the corresponding physical systems.

abstract (pdf)slides (pdf)

While many optimization problems of physical interest are NP-hard, there are a number of problems that are not and systems with glassy dynamics can correspond to problems that can be solved in polynomial time. Applications of existing algorithms and new algorithms have been developed to study pinned interfaces and spin glasses in two dimensions. Results are presented for the dynamics of pinned interfaces ("avalanches") to test analytical theories and for heuristically generated memory effects in spin glasses. A novel application of sampling techniques is used to generate spin glass samples in large 2D systems.

Quantum k-SAT asks whether there is a n-qubit state which is perpendicular to a set of vectors, each of which lies in the Hilbert space of k qubits. Equivalently, it asks whether a particular type of local Hamiltonian has a ground state with zero energy. We consider random quantum k-SAT formulas with n variables and m=\alpha n clauses, and ask at what value of \alpha$these formulas cease to be satisfiable. We show that the threshold for random quantum 3-SAT is at most 3.594. For comparison, convincing arguments from statistical physics suggest that the classical 3-SAT threshold is \alpha_c \approx 4.267. For larger k, we show that the quantum threshold is a constant factor smaller than the classical one.

Our bounds work by determining the generic rank of the satisfying subspace for certain gadgets, and then using the technique of differential equations to analyze various algorithms that partition the hypergraph into a collection of these gadgets. This is the first time that differential equations have been used to establish upper bounds on a satisfiability threshold, and our techniques may apply to various classical problems as well.

This is joint work with Sergey Bravyi (IBM) and Alex Russell (Connecticut).

slides (pdf)The phase structure of mincut graph bisection displays certain familiar properties when considered over sparse random graphs, but also some surprises. It is known that when the mean degree is below the critical value of 2 log 2, the cutsize is zero with high probability. We study how the minimum cutsize increases with mean degree above this critical threshold, finding a new analytical upper bound that improves considerably upon previous bounds. Combined with recent results on expander graphs, our bound suggests the unusual scenario that random graph bisection is replica symmetric up to and beyond the critical threshold, with a replica symmetry breaking transition possibly taking place above the threshold. An intriguing algorithmic consequence is that although the problem is NP-hard, we can conceivably find near-optimal cutsizes (whose ratio to the optimal value approaches 1 asymptotically) in polynomial time for typical instances near the phase transition.

Joint work with Gabriel Istrate, Bruno Goncalves, Robert Sumi and Stefan Boettcher.

slides (pdf)We have developed a replica cluster variational method, which improves over mean-field and Bethe approximations for random models defined on regular lattices (graphs with many short loops). We have applied it to spin glasses, in particular to the Edwards-Anderson model in 2, 3 and 4 dimensions, using the plaquette as the largest region. On the 2d square lattice, we find that the paramagnetic phase is stable for any positive temperature, thus confirming the scenario with Tc = 0. Moreover we have developed a message-passing algorithm which converges to a fixed point for any temperature and provides very good marginal probabilities. Using these marginals and the decimation algorithm we have been able to find ground state configurations.

slides (pdf)Domain walls in disordered systems like spin glasses or random manifolds play an important role in understanding the stability of the ordered phase, the energetics of large scale excitations, the asymptotic dynamics in and out of equilibrium as well as the sensitivity to changes of external parameters. Using exact combinatorial optimization algorithms we study domain walls and chaos at zero temperature in the solid-on-solid (SOS) model on a disordered substrate, which is a numerically convenient representation of a 2d random elastic medium. We focus on three questions: 1) are domain walls in this model described by Schramm-Loewner evolution (SLE), 2) what is the relation between size and energy of optimal excitations (droplets), 3) does disorder chaos exist in the ground state? and compare the results with the corresponding findings for 2d spin glasses.

Joint work with K. Schwarz, A. Karrenbauer and G. Schehr.

abstract (pdf)slides (ppt)

Model counting is the classical problem of computing the number of solutions of a propositional formula. It generalizes the NP-complete task of determining the satisfiability of a formula and has a wide range of applications, but at a considerably higher computational cost. We present an approach to model counting that is based on adding randomly chosen XOR or parity constraints on the problem variables. We show that this technique significantly outperforms existing methods for model counting. We also show how the method can be used for solution sampling. Finally, we discuss results on counting clusters of solutions using a novel derivation of the survey propagation equations.

This is joint work with Lukas Kroc and Ashish Sabharwal.

main slides (ppt)extra slides on MCMC (ppt)

extra slides on counting solution clusters (ppt)

Laboratory for Information and Decision Systems, Department of EECS, MIT

We consider the problem of recovering a function on the space of permutations of n ele- ments, denoted by Sn , using various forms of partial information. This problem naturally arises in various applications such as ranked elections, ranking teams in a sports league, recommen- dation systems, etc. To begin with, we formulate the problem by classifying various forms of partial information through a natural relation to partitions of integer n. This is inspired by the representations of group Sn and hence the resulting formal problem can be restated as that of recovering a function from partially available Fourier coefficients.

As the main result of the paper, we obtain a general condition under which it is possible to recover a function from its partial information. Under a natural random model, we quantify the recoverability conditions in terms of the sparsity (cardinality of the support of the function). We propose a novel, iterative algorithm to recover the function for each form of partial informa- tion. We further establish that under the general recoverability conditions, the function to be recovered has the sparsest support that is consistent with given data. That is, it solves an ap- propriate 0 optimization problem. And, we establish that it is the unique solution. Effectively, under our recoverability conditions our simple algorithm ends up solving the 0 optimization, which is hard in general. We discuss fundamental limitation of this recovery problem using the classical information theoretical framework.

abstract (pdf)slides (pdf)

We consider the #P complete problem of counting the number of independent sets in a given graph. Our interest is in understanding the effectiveness of the popular Belief Propagation (BP) heuristic. BP is a simple and iterative algorithm that is known to have at least one fixed point. Each fixed point corresponds to a stationary point of the Bethe free energy (introduced by Yedidia, Freeman and Weiss (2004) in recognition of Hans Bethe’s earlier work (1935)). The evaluation of the Bethe Free Energy at such a stationary point (or BP fixed point) leads to the Bethe approximation to the number of independent sets of the given graph. In general BP is not known to converge nor is an efficient, convergent procedure for finding stationary points of the Bethe free energy known. Further, effectiveness of Bethe approximation is not well understood.

As the first result of this paper, we propose a BP-like algorithm that always converges to a BP fixed point for any graph. Further, it finds an ε approximate fixed point in O(n^2 d^4 2^d ε^−4 log^3 (nε^−1)) iterations for a graph of n nodes with max-degree d. As the next step, we study the quality of this approximation. Using the recently developed ‘loop series’ approach by Chertkov and Chernyak, we establish that for any graph of n nodes with max-degree d and girth larger than 8d log2 n, the multiplicative error decays as 1 + O(n^−γ) for some γ > 0. This provides a deterministic counting algorithm that leads to strictly different results compared to a recent result of Weitz (2006).

Finally as a consequence of our results, we prove that the Bethe approximation is exceedingly good for a random 3-regular graph conditioned on the Shotest Cycle Cover Conjecture of Alon and Tarsi (1985) being true.

abstract (pdf)We analyze the mixing time of a natural local Markov chain (the Glauber dynamics) on configurations of the solid-on-solid model of statistical physics. This model has been proposed, among other things, as an idealization of the behavior of contours in the Ising model at low temperatures. Our main result is an upper bound on the mixing time of O~(n^{3.5}), which is tight within a factor of O~(sqrt{n}).

In the talk I will explain how existing analysis techniques are apparently not capable of obtaining a similarly tight result. Thus we are led to introduce several novel analytical ideas that we conjecture will have other applications. In addition, our analysis gives insight into the actual evolution of the contours.

This is joint work with Fabio Martinelli.

abstract (pdf)Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. In this talk, I will discuss recent work on using linear programming relaxations to perform approximate inference. By solving the LP relaxations in the dual, we obtain efficient message-passing algorithms that, when the relaxations are tight, can provably find the most likely (MAP) configuration. Our algorithms succeed at finding the MAP configuration in protein side-chain placement, protein design, and stereo problems. Finally, many interesting questions arise when we attempt to do learning using approximate inference algorithms. I will discuss recent results on learning for structured prediction using LP relaxations.

Joint work with Tommi Jaakkola, Amir Globerson, Yair Weiss, and Talya Meltzer.

slides (pdf)The sum-product algorithm works well if the the graph is locally tree-like and a strong mixing condition (correlation decay) holds. Such strong mixing conditions imply the Gibbs measure is unique. In this talk we examine what happens when the Gibbs measure is not unique.

Reversibility (detailed balance) is a sufficient but not necessary condition for a Markov Chain Monte Carlo algorithm to converge to a given distribution. In this talk we will discuss possible advantages of irreversible algorithms. We will first show that irreversible Markov chains can experience faster convergence for certain classes of probability distributions. Second, we will present a generic strategy of constructing irreversible Markov chains from well-known algorithms. We will conclude by presents results of our numerical experiments and discussing future research directions.

slides (pdf)slides (pdf)

A well-known result by Yedidia, Freeman, and Weiss (IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2282-2312, July 2005) says that fixed points of the sum-product algorithm correspond to stationary points of the Bethe variational free energy.

This result can be given a reinterpretation in terms of graph covers. Namely, when running the sum-product algorithm on a Tanner graph then a fixed point corresponds to a certain pseudo-codeword of that Tanner graph: this pseudo-codeword, after taking a biasing channel-output-dependent term properly into account, has locally an extremal number of pre-images in all M-covers when M goes to infinity.

This observation can be suitably extended to the transient part of the sum-product algorithm by expressing the sum-product algorithm in terms of a graph-dynamical system involving graph covers, twisted graph covers, and valid configurations therein. This yields new insights into the sum-product algorithm and its behavior.

slides (pdf)The problem of graphical model selection is to determine the structure of an unknown graph given samples from the underlying Markov random field. It is of interest in various applications in which samples may be obtained but the graph is unknown (e.g., gene networks, social networks etc.) In this talk, we consider the sample complexity of this problem, meaning the number of samples $n$ that are required, as a function of the number of nodes $p$ and other structural parameters, to recover the graph with high probability.

We consider two ensembles of graphical models: the class $\mathcal{G}_{p,d}$ of all graphs with maximum degree $d$, and the class $\mathcal{G}_{p,k}$ of all graphs with at most $k$ edges. Using information-theoretic methods, for the class $\mathcal{G}_{p,d}$, we prove that no method can succeed with probability greater than $1/2$ usingñ fewer than $O(d^2 \log p)$ samples, and we provide two different methods (one polynomial-time, one exponential-time) that succeed w.h.p. using $n = \Omega(d^3 \log p)$ samples. For the class $\mathcal{G}_{p,k}$, we show that no method can succeed with fewer than $O(k \log p)$ samples and we exhibit a method that succeeds w.h.p. using $n = \Omega(k^2 \log p)$ samples. Our analysis reveals some subtle structural aspects of the graphical model selection problem in high dimensions.

slides (pdf)It has been recognized for a long time that methods of statistical physics are applicable to the study of computational optimization problems like the traveling salesman problem. In the random link model, where each pair of cities receive independent "distances", several features of the optimum solution have been predicted based on replica symmetry.

I will present a rigorous approach based on a two-person game, that essentially proves that the replica symmetric assumption is correct for the TSP and minimum matching in a class of random link models. A paper is available at arXiv:0908.1920.

slides (pdf)The true variational free energy is approximated by the Bethe variational free energy, which is exact if the underlying graph is a tree. Depending on the graph topology, the Bethe variational free energy is not convex and has complicated structures. In this short talk, we give a new formula that shows a connection between the graph zeta function and the Bethe free energy. From this formula, we can observe that how graph topologies affect the Bethe variational free energy. Some applications of this formula on behaviors of the belief propagation algorithm are also discussed.

slides (pdf)Finding the largest independent set in a graph is a notoriously difficult NP-complete combinatorial optimization problem. Unlike other NP-complete problems, it does not admit a constant factor approximation algorithm for general graphs. Furthermore, even for graphs with largest degree 3, no polynomial time approximation algorithm exists with a 1.0071-factor approximation guarantee.

We consider the problem of finding maximum weight independent set in bounded degree graph, when the node weights are generated i.i.d. from a common distribution. For instance, we construct a PTAS (Polynomial-Time Approximation Scheme) for the case of exponentially distributed weights and arbitrary graphs with degree at most 3. We generalize the analysis to phase-type distributions (dense in the space of all distributions), and provide partial converse results, showing that even under a random cost assumption, it can be NP-hard to compute the MWIS of a graph with sufficiently large degree. We also show how the method can be used to derive lower bounds for the expected size of maximum independent set in large random regular graphs.

Our algorithm, the cavity expansion, is based on combination of several ideas, including recent deterministic approximation algorithms for counting on graphs and local weak convergence/correlation decay methods.

abstract (pdf)slides (pdf)

The "Divide and Concur" (D&C) algorithm, recently introduced by Gravel and Elser, can be considered a competitor to the famous "Belief Propagation" (BP) algorithm, in that both algorithms can be applied to constraint satisfaction, optimization, and probabilistic inference problems. We will show that D&C can be interpreted as a message-passing algorithm on a constraint graph, which helps make the comparison with BP more clear. The D&C algorithm has some advantages compared with BP, in that it more naturally deals with problems with continuous-valued variables, or with variables that lack local evidence. Another advantage of D&C is that its "difference-map" dynamics enables it to avoid "traps."

We will introduce two new decoders for low-density parity-check (LDPC) codes based on these ideas. The first new decoder is based directly on D&C, while the second decoder borrows the important "difference-map" concept from the D&C algorithm and translates it into a BP-like decoder. This "difference-map belief propagation" (DMBP) decoder has particularly good performance, and can dramatically improve error-floor performance, with a computational complexity comparable to that of standard BP decoders. We will present simulation results for LDPC codes, comparing the D&C and DMBP decoders with other decoders based on BP, linear programming, and mixed-integer linear programming.

Joint work with Yige Wang and Stark Draper.

slides (pdf)Many important random optimization problems present behavior identical to mean field models of glasses, these include graph coloring, K-satisfiability, set of linear boolean constraint, independent sets. The glassy behavior of these problems is at the basis of their average algorithmical hardness. In this talk I will present a new method, generalization of the cavity method, that allows us to study analytically asymptotic behavior of some class of algorithms, including the infinitely slow simulated annealing. New insights following from results of this method will be discussed.

slides (pdf)