Lessons from Amazon

Written by Bhat Dittakavi

Coverage by Bhat Dittakavi of Variance.AI on “Lessons from Amazon” by Rajeev Rastogi of Amazon at NASSCOM AI Summit @Hyderabad on 23rd June 2017

Rajeev Rastogi, Director ML @Amazon

8 areas of focus:

Retail, Customer, Seller and Catalog

Text, Images, AWS and Speech

Let me tackle them one by one.

Problem: Product Demand Forecasting

Forecast based on the past sales in a region (Over forecast means additional warehousing costs, Under-forecast means not giving customer delight)

Challenges

-Scale (100s of millions of products in our catalog)

-New Products (No past demand!)

-Sparsity: (Huge skew -many products sell very few items -long tail phenomenon)

-Seasonal: Demand for some products exhibit seasonal patterns

-Demand Spikes: Variation due to external events

-Distributions: Future is uncertain (prediction are average demand. We need probability distributions of the demand)

Products are stocked at 90 percentile of the demand not at average demand as we are customer obsessed.

Problem: Given a partial user query, find the relevant products to display

Challenges

-Scale: (100s of millions of products in our catalog)

-Real-time prediction: Low latency

-Query analysis: Understand semantics (Apple or apple), identify phrases, classify into product category

-Intent Detection: Window shopping or buying?

–Knowledge graph

Problem: Given a product description from a seller, map it to the appropriate leaf node in product taxonomy (17000 nodes)

Challenges:

-Scale (multi-class classification problem 1000s of classes)

-Products vs Accessories: Hard to distinguish

-Noisy Data

-Training Data Quality (Wrongly labeled, some classes may have very few examples)

Problem: Gven product information (title, desc, price), find duplicate product listings in Amazon Catalog

Challenges

-High Precision Requirement: Incorect matching leads to poor user experience (Pricing: Matching iPhone with its adapter’s price is dangerous)

-Variations: Some variation (e.g. color) are insignificant while others are not

-Diverse Formats

-Noisy Data

Problem: Generate Review summaries. extract product attribute ratings from reviews

Abstractive summarization is not easy. Huge amounts of training is needed. Reviews could be coarse and not at grain level.

Challenges

-Diverse attrributes

-Synonyms (Sound, audio)

-Informal style

-Stylistic variation (Linguistic style varies between users)

-Sentiment analysis (Gauging sentiment may require deep parsing of sentences)

Problem: Recommendation of right product at the right time at the right place to the right customer

Problem: Predict good competitor prices

Problem: Identify substitute products for a given product in Amazon catalog (Co-viewed products). When we are out of stock, we show substitutes.

Problem: Retrieve images from the Amazon catalog for visual similarity

Problem: Provide a voice interface to shop for products, perform tasks, answer questions

Q) Your confidence on deep learning?

ImageNet image classification error rate came down fro 25% to 5% with deep learning. Voice Recognition is deep learning. Machine Translation is deep learning. NLP and Translation too. Deep learning us good in sow IFSC use cases.

Q) How do you handle fake product reviews?

We look for signals that detect fake reviews. We look for patterns and IP addresses.

Posted in AI, Startups

Variance.AI

Bhat's Notes

Recent Posts

Categories