bubudumbdumb

bubudumbdumb t1_jdu90gu wrote

Friends working in rev.ng told me that it's very difficult to decompile to the original high level structures actually used in the source code. Maybe C have a few ways to code a loop but c++ has many and figuring out the source code from assembly is very hard to achieve with rule based systems.

5

bubudumbdumb t1_jbgxlt0 wrote

In my experience NLP models are released as public science when trained on datasets scraped from the web.

Things like "models that solve this problem in finance" or "datasets of annotated football matches" or "medical records of millions of people" are not likely to follow the publication patterns of open science.

If you have a model like the one you asked for you likely have a way to profit from it and you are unlikely to publish it.

10

bubudumbdumb t1_ja414tz wrote

My sweet summer child, MRI data is medical data, the only way you can have that is by having patients (being a clinic or an hospital) and making sure they are ok with you labeling the data and using it for training models. Medical data is very very sensitive and very protected, you probably won't be able to have third party labeling services as you might be required to keep the data on your own infrastructure. Of course all of this depends on jurisdiction and you should consult lawyers.

1

bubudumbdumb t1_j8wtq42 wrote

https://www.investopedia.com/terms/b/backtesting.asp

https://en.m.wikipedia.org/wiki/Modern_portfolio_theory

With extreme synthesis :

markets are not stationary environments so you have to expect and mitigate drift. This have implications on the evaluation methodology and on the choice of time series models that can be calibrated with fewer data points.

A strategy to make money in the markets allocate capital on multiple financial instruments using multiple signals therefore the value of a signal is the predictive advantage that it provides when stacked on top of others commonly used signals. If the predictive capability of the news sentiment is easily replicated by a linear combination of cheaply available signals then it's not worth much.

1

bubudumbdumb t1_j84w7r2 wrote

Reply to comment by lmtog in [D] Transformers for poker bot by lmtog

Correct but the goal is not to train but to infer. I am not saying it wouldn't work just that I don't see why the priors of a transformer model would work better than RNNs or LSTMs in modeling the rewards of each play. Maybe there is something that I don't get about pocker that maps the game to graphs that can be learned through self attention.

2

bubudumbdumb t1_j84mygn wrote

The strength of transformers lies in the transfer of representations learned over large corpuses of text or images. Those are less likely to bring capabilities that generalise to pocker so traditional RL and Monte Carlo approaches are likely to have the upper hand. Pocker's challenges are not linguistic or visual perspective challenges.

1

bubudumbdumb t1_j6uux46 wrote

I would expect a lot of work around regulation. Like probably formal qualifications requirements will emerge for who can tell a legal jury how to interpret the behavior of ML models and the practices of who develops them. In other words there will be DL lawyers. Lawyers might get themselves automated out of courtrooms: if that's the case humans will be involved only in DL trials and the LLMs will settle everything else from tax fraud to parking tickets. Do you want to appeal the verdict of the LLMs? You need a DL lawyer.

Coding might be automated but it's really a question of how much good code to learn from is out there.

Books, movies, music, VR experiences will be prompted. Maybe even psychoactive substances could be generated and synthesized from prompts (if a DL lawyer sign off the ML for it). Writing values will change: if words are cheap and attention is scarce writing in short form is valuable.

The real question is who we are going to be to each others and even more importantly to kids up to age 6.

−2

bubudumbdumb t1_j4n54nk wrote

The Key here is that by detecting key points you don't need to detect the corners per se : you detect at least a dozen points from the pattern on the card then assuming the card is a rectangle on a plane you can identify the corners.

In other words this can be very robust to occlusions, like you might not see more than half of the card and still be able to identify where the corners are

5

bubudumbdumb t1_j4n08pi wrote

So basically you are printing the cards? Or you have a jpg of the cards or you can scan them?

If yes then what you can do is apply SIFT or even faster ORB to the pictures of the cards to detect and describe the salient points. Build a nearest neighbors index of the key point feature space.

(Optionally) Then you can scale the coordinates of the key points to match the intended dimensions in centimeters (or inches of that's your favorite)

Then you can perform the same with the images from your camera. Get run the key points you detect from the camera through the nn index to match each to the most similar key point from the cards. You are going to get a lot of false positives but don't worry : you can use a ransac approach to filter the matches that don't result in a consistent geometry.

The ransac procedure will return a calibrated fundamental matrix that you can use to project the rectangle of the card to the image space captured by the camera.

All the algorithms I mentioned are available in opencv (also the nn index but I dislike that since there are more modern alternatives). Also there are tutorials on how to use and visualize this stuff.

If this is geometrical gibberish to you check out the ORB paper. Figure 1, 9 and 12 should confirm whether this is the kind of matching you are looking for.

https://scholar.google.com/scholar?q=ORB:+An+efficient+alternative+to+SIFT+or+SURF&hl=en&as_sdt=0&as_vis=1&oi=scholart#d=gs_qabs&t=1673904812693&u=%23p%3DWG1iNbDq0boJ

6

bubudumbdumb t1_j34fk6v wrote

In the last month I came across a blog post about vector databases. The post argued that there are a few basic types of distances (L1, L2, cosine) and that you are going to have better fortune using a vector database that supports those than searching using your own heuristic and hybrid solutions. So my suggestion would be to represent faces in some space that you can search over with a vector database or with some nearest neighbors index

2

bubudumbdumb t1_j33xh4d wrote

Sift is good if you want to match images of the same building or cereal box seen from another point of view or with different lightning.

If you want to match images that have dogs or cars or Bavarian houses you might need some sort of convolutional auto encoder as a featuriser.

If you have a lot of GPUs available you can use ViT, a transformer based architecture, to compute features.

Once you have features you might use a nearest neighbors library to find close representations.

3

bubudumbdumb t1_j28j898 wrote

TIL : Nesterov momentum is an extension of momentum that involves calculating the decaying moving average of the gradients of projected positions in the search space rather than the actual positions themselves.

I had a course on control theory and the ingredients of Nesterov momentum seem to be common building blocks of linear control systems: moving average and decay. PID control is the industrial application of linear control theory.

2

bubudumbdumb t1_j1ypwvd wrote

I think the contract and the end user licence agreement is your best bet in terms of IP.

Some time ago I have read some research from Bocconi that concluded that very few industries (like pharma) are happy about how IP protect their competitive advantage. So my suggestion is to think about how to protect competitive advantage not intellectual property.

Even if you don't deploy the model on your customer hw you still have the risk that the model can be used "as a service" to create a synthetic dataset by a competitor (this is one of the risks my team is worried about).

4