DescriptionWe have developed [Reza et al. SC18] a highly scalable algorithmic pipeline for pattern matching in labeled graphs and demonstrated it on trillion-edge graphs. This pipeline: (i) supports arbitrary search patterns, (ii) identifies all the vertices and edges that participate in matches - offering 100% precision and recall, and (iii) supports realistic data analytics scenarios. This pipeline is based on graph pruning: it decomposes the search template into individual constraints and uses them to repeatedly prune the graph to a final solution.
Our current solution, however, makes a number of ad-hoc intuition-based decisions with impact on performance. In a nutshell these relate to (i) constraint selection - which constraints to generate? (ii) constraint ordering - in which order to use them? and (iii) individual constraint generation - how to best verify them? This position paper makes the observation that by estimating the runtime cost and likelihood of success of a constrained walk in a labeled graph one can inform these optimization decisions. We propose a preliminary solution to make these estimates, and demonstrate - using a prototype shared-memory implementation - that this: (i) is feasible with low overheads, and (ii) offers accurate enough information to optimize our pruning pipeline by a significant margin.