Definitely. With windows, you get the advantage of the win32com library whereas with MacOS, you need need to play with AppleScript, which (in my hands) can be brittle and finicky.
Update for the interested - after trying a few different packages suggested in the comments, I settled on the inelegant, yet functional solution of automating the import of PDFs to Microsoft Word, saving the PDF as a Word file, then using a library to extract only the body text from the Word file.
Definitely not ideal since this will not work on Linux and will only run as fast as Microsoft Word can open, convert, and save them. But it works.
cm_34978 OP t1_j2nsi8g wrote
Reply to comment by ypanagis in [D] Data cleaning techniques for PDF documents with semantically meaningful parts by cm_34978
Definitely. With windows, you get the advantage of the win32com library whereas with MacOS, you need need to play with AppleScript, which (in my hands) can be brittle and finicky.