Coverage by Bhat Dittakavi on Utsah talk by Jaggu @IIIT Hyderabad on 4th March 17
Building large scale matching systems
operates ML systems at scale. His systems run more than 200 million users on monthly basis. Jaggu can be reached at firstname.lastname@example.org
We started it out of doctoral research.
Matching systems: Retail, gaming and job portal systems want best experience to customers.
What products should we show to customers?
Need intelligent sales system behind to answer the question above.
Intelligent Virtual Salesman
Match customer preferences with product preferences (whom it is made it for).
Objective: Make a sale.
Use case 2: Recommend a movie
Recommend a movie. Content is made targeting a customer segment. Users have their own preferences.
Objective: Engage the user with relevant content.
Massively Multi-player Game (MMO RPG)
Make people engage within the game longer time. They sell virtual goods such as swords and pets. Objective is to make the player stay and purchase virtual goods.
Every partile in the world map is split into major and minor partiles.
Use case 3: Recommending a Startup to invest
There are 220M registered users with active 40M active users and consume 4TB data every day. Matching is a huge problem here.
Can you match the relevant investors with business preferences. Put Startups in front of relevant investors.
Objective: Put a startup in front of a relevant investor
Use case 4) Best candidate for the job
Understand candidates versus the jobs. Match job preference with candidate preferences.
Objective: Reduce time for hiring.
Image search and text search. Image search is a matching problem. You do a relevance ranking to answer the query. Which is kind of matching problem.
Matching between object 1 and object 2 through interaction between object 1 and object 2 (relevance). Assumption: Object 1 and Object 2 are independent.
Can we build a probabilistic relevance model that utilizes all the available information?
There was a paper back in 1981. Problem of unified model in IT -Stephen Robertson 1981
Conflict in the probability distribution.
I did bidirectional unified model through my Ph.D research paper.
Do modeling object 1 and object 2 separately and independently. This us how I solved the problem.
Some of our clients
TV problem. 12 million audience. Recommended daily the brands (TV series) to customers.
We use loads of product and Customer data. Understanding to match.
Match capital (match Investor with Startups)
Octer (finding similar images)
We use ECS AWS for APIs and Spark
Barb data extrapolated from 5000 devices to millions.
-Don’t solve the problem. Think first. Approach problems on its own terms.
-Write down the fundamental ideas behind all solutions.
-Write down the complete solution. Worry about engineering later.
Log every game data
1) move on the game
2) Every script that is run (sword action)
3) whom he fought
We use the above captured features for
Understand the relevance first. If you are watching Netflix, it is easy to match the profile as one can only see on an average 10 episodes on any day.
Any model with independent user representation and independent product representation and if you have model to arrive at preferences, we are in the game.
Algolia (search algorithm popular)
Fetch data from the clients automatically using AWS. Once we have the data, the scheduler kicks off relevant spark cluster. Image processing means switch on GPU cluster. Text means CPU cluster. Once the cluster learns it, it puts backs In S3.
Electric AWS allows us to switch machines with zero runtime.
When user is interacting, recommendation is real time. We deploy things on client side and then for our side.
We create the user model and we update the user mode on the fly with every activity user does. User model, in case of recommendation model, every keyword has a probability. Also the product model.
Every user has a feature space.
Feature array. Each feature is combination of click, rating and time spent with each product. Each product is a feature. Assign a probability to see whether a user is explained away by this feature. Based on his click, we update all other probabilities!
Possible sentences could be infinite. Language model on tech vocabulary of say 20 million words makes it finite problem. Using only cosine similarity means we can’t bring in the relevance. Relevance is key.
Look at Strata 2014 presentation.
MatchCapital used 9 millions in investments using our technology.
Q) How do you price?
In case of Catelogue processing, we charge per month and per product. If you give any image, we find similar images. People take images from different angles and at different resolutions.
Q) How do you do matching?
We Identify the image. We assign a category. Standard classification means loads of features (1000s) and big trouble. We don’t use classification algorithms but we are pretty close in getting things right. We do this by matching. We do it in a second as against 6 seconds. Matching is about finding similarity. We use user intelligence here. We use CNN for image recognition.
Vision search got challenging.
No of requests you get are very low. The other requests are challenging ones where people having fun. When companies don’t get interest in volumes, they shut.