GPT-4 is out and I think data engineering is going to be out the door soon, I saw this post on medium recently: https://medium.com/@nschairer/gpt-4-data-pipelines-transform-json-to-sql-schema-instantly-dfd62f6d1024

And I was pretty amazed at how well GPT-4 can generate a SQL schema from raw JSON data, and had to wonder if we are wasting our time with NLP models for extracting information from raw text. For example, you could use bs4 to pull all inner text out of certain web forms and have GPT-4 extract meaningful information from them (say SEC filings with pseudo standard fields)...anyone agree?

Comments

abriec t1_jdb30kf wrote on March 23, 2023 at 2:47 AM

I can definitely see it speeding up manual efforts along the data processing pipeline, but data engineering is more than schema generation, just as NLP is more than feature extraction from structured documents.

Very interested in what new tools and workflows would emerge from this though!

breadbrix t1_jdbdx1h wrote on March 23, 2023 at 4:23 AM

Are you willing to bet your job/career on data pipelines created by GPT-4? In a PII/PHI/PCI-compliant environment? Where fines start at $10K per occurence?

Unless the answer is a resounding "Yes" then no, data engineering is not out of the door.

RemarkableGuidance44 t1_jdc2k75 wrote on March 23, 2023 at 9:53 AM

haha exactly, the guy has never worked with data. Just imagine getting an Audit and not knowing if your data is right or not. It could of messed up big time and cost 100's of thousands.

of_patrol_bot t1_jdc2kt9 wrote on March 23, 2023 at 9:53 AM

Hello, it looks like you've made a mistake.

It's supposed to be could've, should've, would've (short for could have, would have, should have), never could of, would of, should of.

Or you misspelled something, I ain't checking everything.

Beep boop - yes, I am a bot, don't botcriminate me.

Mental-Egg-2078 OP t1_jdcn0j7 wrote on March 23, 2023 at 1:20 PM

Fair point, but at what point are these things accepted as providing reasonable assurance (for things like audits)?

I get that the idea of data not being right is scary, but once governing bodies let the tools in, there is no going back. But I agree in situations where reasonable assurance is not acceptable then sure you don't want a predictive machine making critical choices.

MysteryInc152 t1_jd9vmd4 wrote on March 22, 2023 at 9:37 PM

Traditional NLP is out the door yes. There isn't anything bespoke models can do that Large enough LLMs can't do better.