Submitted by AutoModerator t3_yntyhz in MachineLearning
dwightsrus t1_iwu9b2p wrote
I am a noob to ML. How do you suggest I go about converting pdf with restaurant menu and pricing into structured data in json format? Are they ready to use models/websites/services?
InitialWalrus t1_iwuetbz wrote
https://pypi.org/project/PyPDF2/ This python library will allow you to convert the pdf to a string (assuming it is text readable. If it's not text readable you'll need to look into OCR, optical character recognition).
dwightsrus t1_iwuq4um wrote
Thanks for the suggestion. My challenge is that each pdf is not structured the same way. Would love to get a bunch of them go through a ML training model that spits out the data in the format I need.
IntelligenXia t1_iwz2u1e wrote
Check out DonutModel for doc recognition
https://huggingface.co/docs/transformers/model_doc/donut
You should do some manual text annotation, train and fine tune the model and run the inference using Donut , you can output some key:value pair
Viewing a single comment thread. View all comments