Viewing a single comment thread. View all comments

HighTechPipefitter t1_j6x0ki5 wrote

>This was discussed over on
>
>r/datascience
>
>too. We’d love it if it worked out of the box, but the knowledge requirements needed to tell the tool what tables do and what each of their columns mean requires a level of documentation that most companies don’t have reliably

If your tables and columns are named explicitely you can get away with just feeding it your database schema and the AI will figure out what you are talking about.

If not, you can create views to make it more clear what each table and column means and feed it that instead.

You can also give it special rules to keep in mind. For example, if in your DB a "man" is identified as "1" and a woman as "2", you can add this instruction to your prompt and the AI will understand that whenever you are looking for a man it needs to check for the value "1".

I expect text-to-SQL will become a standard pretty soon. It's just way too strong.

1

dgrsmith t1_j6xa0mw wrote

That’s the thing though, you need to know what you’re looking for in the database in order for the database to be able to provide you with data. AI can guess, sure, but you won’t be able to trust the results unless you’re familiar with the database, and ensure the AI is as well. I agree it’s not a breaking case, it’s just a case of considerable resource reallocation.

In your example as well, even though it is an implicit assumption that gender is an easy construct to define, that May not be the case. Are we talking sex at birth? Sex at point of observation? Identifying gender? Constructs require a lot of data understanding and finessing in a manner that end users won’t be able to clearly be able to pull without a human directing the AI somehow by providing data availability and documentation. Once you have those human data prep processes done, yes, you want your end users to be able to ask questions of the data readily. But this requires a fair bit of human anticipation as to what should be available to the AI given end-user business needs.

1

HighTechPipefitter t1_j6xe1ge wrote

There's definitely a learning curve for the user to learn to properly express themselves. But there's also different strategies you can use to help them.

A library of common prompt examples is a first one.

A UI with predefined chunks of query that you assemble is another.

You could also use embeddings to detect ambiguity and ask your user for precision.

You also don't need to expose your whole schema right away, this can be done gradually. You start with the most common requests and build from there. This way you don't need to invest a huge amount of resources from the beginning.

We are barely scratching the surface on how to use it. This will be common practice pretty soon.

If you are in a position that you have access to a database at work, I strongly suggest that you give it a try. It's surprisingly good.

1

dgrsmith t1_j6xffed wrote

>If you are in a position that you have access to a database at work, I strongly suggest that you give it a try. It's surprisingly good.

I'll give it a try with synthetic data! Maybe I'll be surprised at the amount of finessing it doesn't take. I assume it's gonna take quite a bit to make it work, but I'll give it a shot!

1

HighTechPipefitter t1_j6xjlgw wrote

Fun starts here: https://platform.openai.com/examples/default-sql-translate

Then "all you need" is to create an API call with python to get the query from OpenAI and send that query to your database through another API call.

Start small, there's a lot of little quirks but the potential is definitely there.

I expect that in the coming years you will start to see a bunch of articles about the best practice on how to integrate an AI with a database.

Good luck.

2