Company: Diffbot

General Information
  • Job Type: Full-time
  • Location: Stanford, CA
  • Educational Requirements: Masters Degree
Contact Information

About Diffbot
We believe in a world where information can be seamlessly queried from your devices, services, and applications and you’re never directed to examine a webpage to get an answer to your question.  This requires building a new kind of search, one that can see the world as structured information, rather than a collection of documents.  
At Diffbot, we’re building the world’s largest index of structured data by applying computer vision and NLP techniques to the web.  Located a block from the Stanford campus, Diffbot is the first startup incubated by Stanford University and funded by Sun Microsystem’s founder Andy Bechtolsheim and Earthlink founder Sky Dayton.  We’re a small, but growing, team of world-class machine learning, natural language processing, and web search pioneers. Our APIs currently power many of the world’s largest internet sites. 
Quick Facts

  • Team of 8, with a mix of recent grads, serial entrepreneurs, and web veterans
  • Machine learning at web-scale: it’s not just a part of what we do, it *is* what we do
  • Massive datasets (both supervised and unsupervised) and real-time loads, with many classifiers that perform above human-level accuracy
  • Many proprietary and exotic technologies for visual rendering, statistical modeling, and web search
  • Sustainable revenue and growth plan
  • Well-funded with excellent pay and benefits
  • Beautiful environment located walking distance to Stanford campus, restaurants

Scalability Engineer
Diffbot processes 100’s of millions of URLs per month converting each page into structured data that power many of the world’s large internet sites.  And we’re just getting started in terms of scale.  The scalability role at Diffbot focuses on making the machine learning algorithms faster (lowering latency), efficiently utilizing clusters of machines (increasing throughput), and increasing reliability (lowering downtime variance).  

  • At large companies, much of the infrastructure has already been determined, but here you will be able to be part of a small team and directly impact the systems that will allow us to grow.  You’ll learn some machine learning along the way, too.
  • Unique combination of small company with big infrastructure challenges: load balancing, latency, and reliability
  • Opportunities to use systems knowledge to make machine learning algorithms faster
  • Experience with configuration management systems, load distribution strategies, monitoring and instrumentation
  • Hybrid cloud architecture: our own datacenter augmented by cloud services



About Internships
  • We have a limited number of intern versions of the above roles available for Spring and Summer
  • For researchy projects, opportunity for journal publication
  • Interns do the same work as permanent staff, but with scope-bounded projects


How to Apply: To apply, introduce yourself on our team alias