Submitted by Mental-Egg-2078 t3_11ytoh1 in MachineLearning
GPT-4 is out and I think data engineering is going to be out the door soon, I saw this post on medium recently: https://medium.com/@nschairer/gpt-4-data-pipelines-transform-json-to-sql-schema-instantly-dfd62f6d1024
And I was pretty amazed at how well GPT-4 can generate a SQL schema from raw JSON data, and had to wonder if we are wasting our time with NLP models for extracting information from raw text. For example, you could use bs4 to pull all inner text out of certain web forms and have GPT-4 extract meaningful information from them (say SEC filings with pseudo standard fields)...anyone agree?
abriec t1_jdb30kf wrote
I can definitely see it speeding up manual efforts along the data processing pipeline, but data engineering is more than schema generation, just as NLP is more than feature extraction from structured documents.
Very interested in what new tools and workflows would emerge from this though!