Natural Language Processing Engineer

Company: Diffbot

General Information
  • Job Type: Full-time
  • Location: Stanford, CA
  • Educational Requirements: Masters Degree
Contact Information

About Diffbot
We believe in a world where information can be seamlessly queried from your devices, services, and applications and you’re never directed to examine a webpage to get an answer to your question.  This requires building a new kind of search, one that can see the world as structured information, rather than a collection of documents.  
At Diffbot, we’re building the world’s largest index of structured data by applying computer vision and NLP techniques to the web.  Located a block from the Stanford campus, Diffbot is the first startup incubated by Stanford University and funded by Sun Microsystem’s founder Andy Bechtolsheim and Earthlink founder Sky Dayton.  We’re a small, but growing, team of world-class machine learning, natural language processing, and web search pioneers. Our APIs currently power many of the world’s largest internet sites. 
Quick Facts

  • Team of 8, with a mix of recent grads, serial entrepreneurs, and web veterans
  • Machine learning at web-scale: it’s not just a part of what we do, it *is* what we do
  • Massive datasets (both supervised and unsupervised) and real-time loads, with many classifiers that perform above human-level accuracy
  • Many proprietary and exotic technologies for visual rendering, statistical modeling, and web search
  • Sustainable revenue and growth plan
  • Well-funded with excellent pay and benefits
  • Beautiful environment located walking distance to Stanford campus, restaurants

Natural Language Processing Engineer

The web is perhaps the largest corpus of natural language ever created by human beings and the many languages that are used on the web make it an exciting place to test theories at large-scale and leverage big data to build generalizable models. Previous NLP attempts on the web have had to compromise by treating the collective text in a web document as a noisy stream of tokens, thus only suitable for shallow analysis.  However, Diffbot’s unique visual classification breaks apart the page into meaningful components—creating the first clean testbed for NLP on the web.

  • Work with the web corpus and multi-language modeling
  • Go beyond sentiment analysis and binary classification—model tagged language deeply and extract valuable and accurate knowledge
  • Automatically build lexicons and ontologies from the web
  • Morphological analysis: multi-lingual postal address extraction and date parsing


About Internships
  • We have a limited number of intern versions of the above roles available for Spring and Summer
  • For researchy projects, opportunity for journal publication
  • Interns do the same work as permanent staff, but with scope-bounded projects


How to Apply: To apply, introduce yourself on our team alias