The data should be used only for non-commercial research purpose. Georgia Institute of Technology and Google make no representations or warranties regarding the Data, including but not limited to warranties of non-infringement or fitness for a particular purpose. All the urls/annotations are crawled from public internet. Please note this data is a subset of the data used in the web-scale experiments in our paper. At this time we only keep the queries from wikipedia/wordnet(about 66K) for confidential reasons.