Cantareus t1_je6vn79 wrote on March 29, 2023 at 8:57 PM

Reply to comment by Im_Unlucky in [D] The best way to train an LLM on company data by jaxolingo

>shove data into vector db
>
>embed incoming query
>
>return topk matches from vector db

Can these steps be done through the openai API?

You only reply using JSON. Write 5 queries that will return useful data to aid you in answering the users questions "What was Apple’s return compared to it’s sector last month" return in JSON array with no explanations. Each query must run independently. Use the return format [{"Reason":"Why the query is useful","query":"The sql query"}] The schema is:

(Imagine a valid schema is here)

GPT-4 Response:

[

{

"Reason": "Retrieve Apple's stock return last month",

"query": "SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"

{

"Reason": "Retrieve sector for Apple",

"query": "SELECT sector FROM companies WHERE company_name = 'Apple';"

{

"Reason": "Retrieve companies in Apple's sector",

"query": "SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple');"

{

"Reason": "Calculate average return for Apple's sector last month",

"query": "SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE();"

{

"Reason": "Compare Apple's return to the average return of its sector last month",

"query": "SELECT (SELECT stock_return FROM stocks WHERE company = 'Apple' AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) - (SELECT AVG(stock_return) as sector_return FROM stocks WHERE company IN (SELECT company_name FROM companies WHERE sector = (SELECT sector FROM companies WHERE company_name = 'Apple')) AND date >= DATEADD(month, -1, GETDATE()) AND date < GETDATE()) as comparison;"

}

]

jaxolingo OP t1_je7c63z wrote on March 29, 2023 at 10:54 PM

Cool, this is exactly the approach I had in mind that the model would take

gstatton t1_je7mwmv wrote on March 30, 2023 at 12:15 AM

I’ve done something similar to this, but I created a view in snowflake, dumped the ‘describe table <table-name>’ to CSV, used that CSV descriptor in the head of the prompt, and asked the LLM to generate a SQL query from the user’s question using the provided table schema…I’ll take the model’s return and pass it directly to snowflake, return it as a data frame, and then send that back to the LLM to get a text summary of the query results

You can also use LangChain to set up the chain to do this automatically..