Machine Vision Engineer

Company: Diffbot

General Information
  • Job Type: Full-time
  • Location: Stanford, CA
  • Educational Requirements: Masters Degree
Contact Information

About Diffbot
We believe in a world where information can be seamlessly queried from your devices, services, and applications and you’re never directed to examine a webpage to get an answer to your question.  This requires building a new kind of search, one that can see the world as structured information, rather than a collection of documents.  

At Diffbot, we’re building the world’s largest index of structured data by applying computer vision and NLP techniques to the web.  Located a block from the Stanford campus, Diffbot is the first startup incubated by Stanford University and funded by Sun Microsystem’s founder Andy Bechtolsheim and Earthlink founder Sky Dayton.  We’re a small, but growing, team of world-class machine learning, natural language processing, and web search pioneers. Our APIs currently power many of the world’s largest internet sites. 

Quick Facts

  • Team of 8, with a mix of recent grads, serial entrepreneurs, and web veterans
  • Machine learning at web-scale: it’s not just a part of what we do, it *is* what we do
  • Massive datasets (both supervised and unsupervised) and real-time loads, with many classifiers that perform above human-level accuracy
  • Many proprietary and exotic technologies for visual rendering, statistical modeling, and web search
  • Sustainable revenue and growth plan
  • Well-funded with excellent pay and benefits
  • Beautiful environment located walking distance to Stanford campus, restaurants

Machine Vision Engineer

Machine vision engineers at Diffbot are a resourceful bunch, always looking to squeeze every drop of signal out of a dataset.  Unlike machine learning roles at other companies, our goalpost is to extract the unequivocal truth from a source document, not a subjective ranking, sentiment, or score.  Because of this higher standard for accuracy, we’ve had to create new systems for handling training data and invent novel and performance-optimized algorithms.

  • Mix of object classification, scene understanding, and document analysis in a novel setting
  • Derive features by combining signals from disparate sources
  • Invent and test new ML techniques that can generalize to the web
  • Leverage near-infinite amounts of unsupervised training data on fast machines (40-core, 120GB ram, SSD, GPU)
About Internships
We have a limited number of intern versions of the above roles available for Spring and Summer
For researchy projects, opportunity for journal publication
Interns do the same work as permanent staff, but with scope-bounded projects


How to Apply: To apply, introduce yourself on our team alias