Submitted by AutoModerator t3_zp1q0s in MachineLearning
pmac_red t1_j2cmqgg wrote
I've got a lot of experience writing software, specifically web services, but the AI/ML stuff is new to me. I'm reading a lot and can wrap my head around the code/frameworks side of it but the math and algorithm stuff is Greek.
I'm currently playing with AWS Sagemaker (seems easy enough and I've got an AWS account so it's easy). My goal is to experiment with a problem I have at work:
Full context:
We are an SaaS API product which customers integrate to.
Customer onboarding is a big focus right now. Integration payloads (JSON) can be pretty large, e.g. up to a couple hundred properties so it can be a little tricky for developers on the customer side to map from their internal system data format to ours. Product is approaching this as an education problem: customer documentation, examples etc to help teach the customer how to integrate. I think the problem is that it's just over the edge of being too big to build a complete mental model in your head of the system-to-system mappings so there's a lot of look up and reference. I think if some sort of ML model could be trained with existing customer data then a new integration could just present a payload and we could do most of the heavy-lifting automatically drastically reducing the complexity of the integration.
TL;DR
We have a target JSON document, customers have a source. I'd like to produce a set of mappings e.g. addr1 -> streetAddress
to predict how to map the source to target.
Is this a common problem? Is there a known algorithm/model I should look at or a family of which I should look at?
I'd appreciate any fingers pointed in the right direction.
trnka t1_j2d4wt7 wrote
There must be a name for this but I don't know it. It's a common problem when merging data sources.
If you have a good amount of data on existing mappings, you could learn to predict that mapping for each input field. The simplest thing that comes to mind is to use character ngrams of the source field name and predict the correct target field name (or predict that there's no match).
If you also have a sample of data from the customer, you could use properties of the data in each field as input as well -- the data type, range of numeric values, ngrams for string fields, string length properties, etc.
As for the business problem, even with automated mapping you probably need to force customers to review and correct the mappings or else you might end up with complaints from customers that didn't review.
All this isn't quite by area of expertise, hope this helps!
Viewing a single comment thread. View all comments