Graph data (e.g., communication data, financial transaction networks, data describing biological systems, collaboration networks, organization hierarchies, social media, etc.) is ubiquitous. While this observational data is useful, it is usually noisy, often only partially observed, and only hints at the actual underlying social, scientific or technological structures that gave rise to the interactions. One of the challenges in big data analytics lies in being able to reason collectively this kind of extremely large, heterogeneous, incomplete, and noisy interlinked data.
In this talk, I will describe some common inference patterns needed for graph data including: collective classification (predicting missing labels for nodes), link prediction (predicting edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe probabilistic soft logic (PSL), a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges.