Submitted by AutoModerator t3_z07o4c in MachineLearning

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

23

Comments

You must log in or register to comment.

koiRitwikHai t1_ix93rg6 wrote

Why ReLU performs better than other activation functions when it is neither a differentiable function nor it is zero-centered?

6

ElectronicScar8055 t1_ixv9hiv wrote

Hi all, software engineer here with no ML background.

I’m thinking to build a support, to help engineering teams mitigate time spent in issues.

I was thinking to apply some ML and have the model learn over time, and default to “create a support ticket” If it doesn’t have enough accuracy for an answer (or if the user rated it as not helpful).

At first, the bot will be just creating tickets, but over time, I was thinking to have it learn by the different resolutions the engineers give (ie. Link to documentation, grant access to a system, etc).

Is this even a possibility? Having a machine learning model untrained and have it learn over time? Any other suggestions?

I could have some initial data, but I’m not interested in súper old data, as the systems and documentation may have changed

3

deepshiftlabs t1_iyb4iuf wrote

I am very new to this and have more entrepreneurial rather than machine-learning experience. I have 27K conversations in Gmail - the tech support channel for the SaaS system. I have this idea of building something that will suggest an answer to a new question a customer posts based on previous answers given to similar questions or provide a few suggested answers.

Extracting the first thread email and the consequent response is not a problem. Here is a typical request response.

Questions:

- is this project doable and what libraries/tools you would use?

- should I concentrate on extracting meaningful sentences from requests and responses as marked (1) and (2) in my example? I want to make this service generic and it seems a non-obvious problem in ML.

Thank you

3

BegalBoi t1_ix40w8x wrote

How can I balance an independant variable for K-Nearest Neighbour model (or any regression model).
So I have dataset for electricity consumption of a city for a year which consists of 7 independant variables out of which the windspeed column has values ranging from 3 to 570 (units). I am getting an accuracy of only 3%, no matter which model I use.
Can anyone suggest how would I balance my dataset to predict electrcity consumption.

2

I-am_Sleepy t1_ix8etfd wrote

  • Have you scaled your data? If one signal magnitude too large, it can dominate the others, if not try StandardScaler, or PCA Decomposition
  • Why use kNN? Why not other models? But if you are somewhat lazy, there is Pycaret you can try (It automagically preprocess data + compare a lot of models for you)
  • Also is it a time-series data?
1

sanman t1_ix41yew wrote

What Are Latest Cutting-Edge Applications in Generative Modeling?

Like everyone else, I've been playing with the new release of Stable Diffusion recently, and marveling at its output. I want to know what else is out there that makes use of Generative Modeling. What are the newest and most exciting things in development? I really want to know.

I can already see Generative Modeling being used for music. But beyond just artwork, what are other big fields or practical applications? What about CAD, for example? If a Machine Learning model was trained on enough CAD files of various types, could it learn how to design machinery, equipment, vehicles, buildings, etc? If a Machine Learning model was trained on lots of DNA samples categorically labeled according to their phenotypes, then could it learn how to make living things?

1

MLisdabomb t1_ix43577 wrote

Does anyone know of any services or companies that allow you to sell you gpu cycles into a shared cloud deep learning pool? Kind of like crypto mining but for deep learning. Anyone aware of anything like that?

1

Segmaster01 t1_ix4lnlz wrote

I would like to restore/upscale some old VHS footage as a gift for my mother this Christmas. Does anyone have a suggestion for a commercial service/company that provides this service, ideally incorporating AI/ML and not just filters or traditional methods?

I realize there are a number of software products that can be used for this, but I'd rather someone experienced handle it for me since I'm rather new to it.

Thanks for any suggestions!

1

Secure-Blackberry-45 t1_ix5hoes wrote

Hi everyone! Firs of all I’m new to machine learning “inside” mobile applications. Please be understanding 🙂 I want to implement a machine learning model via Firebase for a mobile app (iOS, Android) built on React JS. But model size limit in Firebase is 40 MB. My model is 150+ MB. This size would be way too big for the app for people to download. What are the solutions for hosting machine learning model 150MB+ for a mobile application? Is there a workaround to use Firebase with my model? Please advice.

1

vidret t1_ix5z4zb wrote

Have you tried making the model smaller by turning into 16 bit floats instead of 32 bit floats? If it’s already 16 bit you could try 8 bit ints and see if the performance drop is acceptable. I think tensorflow and torch both have these options available.

Less simple option is changing the architecture to make it even smaller, there’s a variety of methods. Before doing that I’d have a look around to see what sort of tricks everyone else with the same goals as you are using.

1

pormflakes-o_o t1_ix5ko3w wrote

I'm looking for an algorithm that will do the following: the user chooses some parameters, the algorithm then looks for the remaining parameters which minimize some value that is dependent on all of the parameters.
I'm thinking of genetic algorithms but I have no idea which would be appropriate.
I'm open to any suggestions! I'm new to ML if it wasn't obvious ;)

1

I-am_Sleepy t1_ix8d95u wrote

Genetic or gradient-based is okay, but if you really don't want to do anything and have only few parameters, you can use HyperOpt (It usually being used to optimize hyper-parameters, because it treat the objective as a black-box)

1

observerrr t1_ix5lfap wrote

Can I change some values from a config.yaml file on a github repository and run the code to see what happens regarding my changes? I'm bit confused as there're many files that contain the same context so if I were to chnage some vlues from a cnfg file in order to obtain some overall changes do ı need to make same changes through the other files that has the same context

1

vidret t1_ix5zr4h wrote

Depends on how the code is written, but yes that is the idea.

You can always see what sort of changes the yaml file does by looking through the script/parts of the code that loads and makes use of it. But if a config file is there that’s probably where you should config things.

3

DevilsPrada007 t1_ix6qi24 wrote

How can Machine Learning be applied to Newswires to provide real time insights to customers?

Is a move to cloud needed?

1

I-am_Sleepy t1_ix8g3q5 wrote

I'm guessing you are trying to make sentiment analysis (NLP) on Newswires data source. If there is a public API, you can queried data directly. If not, you would need to write your own crawler. Then you can save the data locally, or upload them to cloud like BigQuery. For a lazy solution, you can then connect your BigQuery dataset to AutoML

But if you want to train your own model, you can try picking some from HuggingFace, or follow paper trails from paperswithcode

1

Still-Barracuda5245 t1_ix7t8bg wrote

What is the preferable distribution for target variable in a regression task? If my target variables do not conform such distribution, how can i fix that? Is there a problem in regression which is equivalent to class imbalance in classification?

1

jon-chin t1_ix93k8o wrote

please bear with my since I'm pretty new:

I'm doing topic modeling on a set of tweets using GSDMM. to do that, I need to tokenize and stem them. I can get the clusters, their document sizes, and their stem counts.

however, I'd like to pull in metadata, namely the timestamps of the tweets. is there a way to do this easily? right now, I'm doing a second pass after the modeling is done and guessing which cluster each of the original tweets belongs to. is there a better way to have GSDMM aggregate this metadata while it does the modeling?

1

trnka t1_ixew7z9 wrote

It's hacky, but you could transform the timestamps into words. I've used that trick a few times successfully.

Something like TweetTimestampRangeA, TweetTimestampRangeB, ... One downside is that you'd need to commit to a strategy for time ranges (either chop the data into N time ranges, or else tokens for month, year, etc)

1

pretty19 t1_ix9gbuu wrote

I am doing machine learning modelling on Black Friday sales predictions data which has all independent variables as categorical and dependent variable as continuous which also needs to be predicted. I am wondering for such data ( when all independent variables are categorical) is Linear Regression suitable? Thanks.

1

trnka t1_ixevsqx wrote

Linear regression is a good place to start -- it trains quickly and works well with small amounts of data. Categorical inputs aren't a problem; one-hot encoding will learn weights for each value.

That said, linear regression isn't always best, and it depends on your data.

2

bankCC t1_ix9o1ln wrote

Which approach would be best for a classification of text into 2 categories, where my dataset is realy small and unbalanced (4000, 250) each text containing around 200-300 words.

And most of the time just one or two words will lead to classification. I could just do a keyword search, but misspelled words might slip through and the dictionary would be pretty big and computational expensive to compare on each file. So I thought ML would be a better idea.

Maybe a CNN but the dataset seems to be way too small to accomplish acceptable results.

Any hints are welcome tyvm

1

Gazorpazzor t1_ixc37ng wrote

Hello,

  1. Extract Features using "TF-IDF" (If the classification is likely led by few specific words)
  2. Train an SVM classifier ( In your case, with few data samples, I would train different classifiers with different hyperparameters and keep the best model. NN architectures like GRUs and LSTMs give decent results, unfortunately they might need more data to produce good results)
  3. Increase your iteration / epochs to compensate for the really small dataset size (keep and eye on the evaluation set loss to prevent overfitting)

As for the data imbalance problem, I would try with undersampling the 4000 samples class set to 250 samples first, then try to improve results later on by data augmentation or cost sensitive algorithms ( cost-sensitive SVM, weighted cross-entropy,...)

2

bankCC t1_ixc6lk0 wrote

Thank you very much for the answer! I highly appreciate it. You gave me a realy good base to start from. Huge thanks

2

BBAAQQDDD t1_ixcvhcy wrote

Maybe a stupid question but I've always wondered how backropagation works. Maybe a stupid question but I've always wondered how backpropagation works. I do not understand how we actually know how z changes with respect to x (where y would be the output) and x a node in some layer. My intuition would be that you know the weight (w) from x to z that you could just say that y = activationfunc(w*x) (of course with a load of other input and weights). So how do you know the amount with which z changes if x changes?

1

give_me_the_truth t1_ixcwr6u wrote

It is not clear what is z.

However I think gradient descent can also be thought of as back propagation in its simplest sense where independent variable is updated based on change in dependent variable.

1

danman966 t1_ixgzwfh wrote

Back propagation is essentially applying the chain rule a bunch of times. Since Neural nets and other functions are just applying basic functions loads of times on top of a variable x, to get some output z, e.g. z = f(g(h(x))), then the derivative of z with respect to the parameters of f, g, and h, is going to be the chain rule applied three times. Since pytorch/tensorflow store all derivatives of their functions, e.g. activation functions or linear layers in a neural network, it is easy for the software to compute each gradient.

We need the gradient of course because that is how we update our parameter values, with gradient descent or something similar.

1

Laughingspinchain t1_ixcx7qd wrote

Hello everyone!

So I have some academic knowledge of ML thanks to a course that I did in my university but I want to expand my skills in this subject.

I already did projects with logistic regression, linear regression, straightforward Neural Networks, Convolutional NN, recursive NN and not so much more.

Do you have any advice on some advanced books/courses or alike that I could explore? You can go heavy on the math side if it's required :)

1

New_Pie4277 t1_ixdbd27 wrote

I'm completing my first ever data science project. It has real data and the goal is to make a prediction model that I train using a given data set that I have to clean first. Are there any programs(udemy), books, youtube series that walk you through projects OR have you complete a data science project. I need some experience before I tackle the real thing. I'm a math and cs undergrad student.

1

DeepArdent t1_ixdlbg1 wrote

Is there a Javascript npm package that returns the sentence similarity of two sentences using ML? Here similarity means how close the sentences are in terms of their meaning and not how close their character count is or word count is.

​

My ultimate aim is to find which sentence(strings) among a set is most similar to a given sentence in a NextJS app.

1

I-am_Sleepy t1_ixfrnr0 wrote

Using tfjs? The sentence embedding vector and be then compare using cosine similarity (which is relatively easy to implement in javascript, better yet the project page already implement dotProduct, and the vector is already normalize)

1

Lmzssgy4745 t1_ixe6gtd wrote

Hi, what would be the best architecture to predict Fourier spectra? I’ve got on spectrum of one measurement and want to predict the spectrum of another measurement.

1

SwabianStargazer t1_ixe8q19 wrote

Hi. I am a software engineer working on mostly backend stuff but now need to dip into ML territory for the first time. I have zero experience and need some pointers to identify the right topics to research for my use case.

We have test data for machines that do the same task over and over again for a long period of time during a test run for stress testing. Let’s say we have a sampling rate of 30Hz for features like temperature, motor rpm and motor voltage during this time. So the result after a test run is e.g. 10 hours of data that contain the same procedure 10.000 times.

I now want to analyze the data for outliers to identify problems during the test. For example I want to identify the test cycles that had abnormal high temperature etc. Result should be something like a timestamp and a label so that I see which of the 10.000 cycles should be inspected further by a human.

Another thing that I am interested in is a way to automatically split tue data into 10.000 separated cycles so we can see when a cycle started and when it ended (remember there are 10.000 cycles in the data)

What would the base approach to achieve these things? Which methods and models should I look into and do my research on?

Thanks in advance for all pointers and help!

1

trnka t1_ixeuv51 wrote

You might be able to try outlier detection to identify unusual test cycles. Though I've heard that it's often better if you're able to label even a small amount of data for whether it's anomalous or not, because an outlier detection method doesn't know which features are important or not, and labeled data can teach ML which features are important.

Feature representation might be tricky but a simple way to start is min, max, avg, stddev of each sensor.

To segment test cases, you could make it into a machine learning problem by predicting whether time T is the start of a cycle, trained from some labeled data. I imagine that getting good results will depend on how you represent the features of "before time T" and "after time T"

Not my area of expertise but I hope this helps!

1

Evoke_App t1_ixfxymq wrote

Is there a video recognition AI that's open source like Yolo, an image recognition AI?

1

Gazorpazzor t1_ixq5oa5 wrote

Usually Image recognition models are the ones used for video recognition too. Yolo models are often used for video recognition thanks to their near real-time inference time.

I invite you to check YoloV7 github page, they also have a script implementation of their model for video recognition on the main page.

1

danman966 t1_ixh02pl wrote

Is there any way to output the parameters (or weights) of a SVM model that is fit in sklearn? I can't find anything online, nor can find anything by digging into the code of libSVM/sklearn, and I can only find the intercept by inspecting the model fit in python.

I also made a stackoverflow post which got no replies. This seems to be way harder than it needs to be!

1

PunsbyMann t1_ixhdvyf wrote

Hey guys! I am applying for MS CS in Fall '23. Do you know any strong MS programs for AI/ML other than top institutions? I am interested in Graph ML, CV, and core deep learning theory. Also, GRE waived ones please, I bummed my verbal section :/

1

Hornball72 t1_ixhlvn6 wrote

Hey there, collective of knowledge! I'm looking into using ML to analyze telemetry data to determine a state from data over time. It does not need to be a predictive model, just learn the "signs", so to speak, to be able to judge what state (and at what confidence it thinks it is correct).

The data is *nearly* good enough to have programming logic be able to determine the current state, but not 100% reliable.

I was thinking that the CSV data I have from telemetry (as well as new telemetry) can be marked up with what state it is in at the time of recording (rows are basically samples at 60Hz rate), and is pretty easy to mark up from a human perspective, since state changes normally takes place at 1-2 minute intervals (if that), with a few states lasting some 20-30 seconds. I surmise that this data could be used for the training phase, and I am specifically looking for finding the state **changes** when that happens.

I can easily create realistic sample data with markup, which I assume is step 1.

Target is to be of use in Apple's eco system, but I have very little idea of what kind of training of the ML model is best for such practice as this. I suspect that the model would need a sample size, time-wise, of say 60 seconds to compare with real-time live data.

Any help, pointers, advice, links, resources and such is appreciated!

1

LeN3rd t1_ixjhvd2 wrote

What is the best way to install CUDA/CUDNN without selling your soul? I have tried it every way possible, and nothing is as smooth as i hoped it would be. I need something that detects already installed libraries, does not break already installed ones (looking at you conda) and makes it easy to switch between the different versions. Give me something that is not just a "Installed it once, never touching that s.. again" pls.

1

SeaResponsibility176 t1_ixjmfot wrote

Hello community! I am about to start a project where I'll be using Vision Transformers for prediction of next frame in video. I would like to know if there is a way to get started with vision transformers.
I am not familiar with Keras, Tensorflow, etc. What is the best way to get started? Shouls I jump straight into ViT? I know the theory, just need to get the code running!
Thank you very much. Any additional resources are appreciated.

1

grchelp2018 t1_ixmmpoh wrote

Software dev with no ml background here. I'm trying to implement semantic search. User enters a query and I should return the top 3 closest results. Right now, I'm basically splitting all my text into sentences and storing the embedding of each sentence. Is this scalable? Is there a better way? Are there pre-trained models that can generate embeddings for paragraphs and larger bodies of text?

1

QuantumML_CO t1_ixnv7t4 wrote

I’m using WEKA and the UNSW-NB15 dataset for a dissertation on XAI. I’m looking for a way to extract weights for the attributes used to generate an arbitrary result from the model (likely either decision tree or random forest). Any thoughts would be helpful. Thank you in advance.

1

Fantastic-East9797 t1_ixomrhm wrote

Hi All,

I hope everyone is having a great holiday.

I am an undergrad student with one year left until graduation; I know a few languages and don't have any experience in the front end or back end. I know python and took a beginner ML course.

I want to get into ML/AI, but I don't know the right path to learn. I don't know what field I want to be in.

I am using Codecademy to learn ML now. Is this a good resource? Are there better resources?

What can I do in the one year I have left to break into the field of ML/ AI?

Do I need to learn full-stack to be an ML AI engineer?

Are ML/AI engineers considered data scientists? are they SWE?

Thanks

1

zombie_ie_ie t1_ixpbnmg wrote

  1. Get your fundamentals (including the math) strong. You can use various sources like YouTube, Codeacademy, Udemy, Coursera etc. I'd really recommend Andrew NG.
  2. Do Kaggle competitions and try to get in the top 100. The higher the better.
  3. Make some cool and interesting projects and post them to your GitHub. Try solving some real-world problems.
  4. Apply for internships. But if you're looking to become a professional data scientist then SQL along with cloud and/or big data is also essential.

​

>Do I need to learn full-stack to be an ML AI engineer?

No

​

>Are ML/AI engineers considered data scientists? are they SWE?

Certainly not SWE. ML/AI/Data Science involve more or less the same things and skills. Many companies use the terms interchangeably but they don't mean exactly the same thing.

1

yungboi337 t1_ixpcdxd wrote

Hello all, looking for some guidance in regards to building a formula or system I guess based around some statistics in regards to sports matchups. Not sure if this is the right forum, just trying. I dont have any ML background what so ever or statistics. Just trying to save myself a lot of time.

What I would like to do is build some sort of formula or program that simply identifies favorable betting lines based on current season performance.

For example, I would want the formula to give me the a result if I "asked" it: Show me betting lines in which a single player has met that certain statistic (pts/reb/ast) AT LEAST 50% of the time this season AND is facing a bottom 5 team against such position for the specific metric. All of this data is readily available and I find the betting lines im looking for on my own "manually" but am just seeing if there is someway I can automate it to save myself some time. Totally aware this could be something I would hire someone for? Just hoping someone can put me in the right direction.

Im sorry if i am not presenting the question in proper terms for you all. Thanks in advance.

1

ustainbolt t1_ixrqhoy wrote

Could anyone help me by suggesting an architecture that might work for my problem? I've tried many (nn and otherwise) but haven't made much progress.

My data consist of groups of 10 people, each person has attached to them a number x_1 (identifying which person they are approx. ~1000 possibilities) and an integer x_2 (which can take one of ~150 different values). The group of 10 people then attempt to complete a task and if they are successful the data is labelled with 1, else it is labelled with 0.

You could think of the task as playing a football match (5v5) and x_2 as the position that they choose to play on the field.

Does this remind anyone of a particular class of problem?

1

I-am_Sleepy t1_iy3x7p2 wrote

So what is your task again? If it is a regression problem i.e. given 10 people, calculate probability of label being 1. Then basic binary classifier should do the trick. If the problem is maximizing probability of label being 1, that will be closer to reinforcement learning. Which you can go a few way of here but for me, I would implement using genetic algorithm

1

isbtegsm t1_ixz5gei wrote

Hello, I have a class of optimization problems (not a neural net) which I want to solve via gradient descent, what is the best library to figure out the best learning parameters (step size, batch size, etc.) given a fixed limit of steps?

1

Afghan_ t1_iy3m6xz wrote

Hey everyone,

I was wondering where I could potentially find good books on Diffusion Models - books which aim to also describe the mathematics behind the models.

1

Throwaway00000000028 t1_iy42ker wrote

Since they are relatively new, I don't know of any good books on diffusion models. But there are some great resources online.

Lilian Weng's Blog: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

Yang Song's Blog: https://yang-song.net/blog/2021/score/

Youtube videos: https://www.youtube.com/watch?v=fbLgFrlTnGU

Seminal papers:

- Denoising Diffusion Probabilistic Models: https://arxiv.org/abs/2006.11239

- Improved Techniques for Training Score-based Generative Models: https://arxiv.org/abs/2006.09011

- Hierarchical Text-Conditional Image Generation with CLIP Latents: https://arxiv.org/abs/2204.06125

Review papers:

- Understanding Diffusion Models: https://arxiv.org/pdf/2208.11970.pdf

And so on...

2

Afghan_ t1_iy460l4 wrote

Thanks for the links! Greatly appreciated :-)

1

SuitDistinct t1_iy45v1t wrote

How was prunning done on Keras particularly before the introduction of the model_optimization add on ?

Ive seen older papers that the module but can't find their implementations. Im just looking to do prunning surgery.

1

IntelligenXia t1_iy465pn wrote

Hi Learners ,

What are some of the pandas ( python package for dataframe manipulations ) alternatives you have used for dataframe operations that uses GPUs ?

1

PulPol_2000 t1_iy4dnp4 wrote

Hi im currently doing my research on how accurate AR Core or Google's ML kit in terms of object recognition. But one of our requirements was to have an hardware like or raspberry PI is there a way i can integrate the ML kit into the RPI Sorry for newb question but thank you in advance!

I know that AR Core only supports android and my research aims to use this in an android panel for vehicles and a camera which will be both connected into or raspberry PI

1

scarbchaser t1_iy4htr4 wrote

I'm new to this so any help is appreciated. Been looking for resources but maybe I'm using the wrong keywords.

What's the best way to approach building a data set of similar technologies like synynoms in the English language but for other things.

Example. Java, jdk, android, jdk7 can all be "java" related, and "programming", "tech" etc

Where would one start, setting this up almost like tags. Are there already existing datasets?

What if I wanted to do calculations later or build some type of inference, on Java. But have it apply for all things related to all those other ones.

Thanks and sorry. Might be ambiguous because not sure where to begin

1

ProfessionalShame900 t1_iy4nuj9 wrote

I am new to ML. I am doing research on clustering high-dimensional space. I have the following challenges, I am wondering if you can enlight me with some pointers (pun intended) and resources

And there are conditional cases in the theory to group that parameter i.e. (if a>0 and b>1 then in cluster 1). how do add those in the cluster algo? Can vectorization work?

How to visualize the cluster in high-dimensional space?

There are parameters that only vary in a small range (say 0.9 to 1.5) and have some large anomaly cases (with over 40). Should I add a function to make to exaggerate the variation and do a log to make a large anomaly? But will that create artificial clusters?

1

C0hentheBarbarian t1_iy73sx4 wrote

> How to visualize the cluster in high-dimensional space?

t-SNE could work for this

1

unholy_sanchit t1_iy4uykc wrote

Does AAAI not accept appendices at all? Why did they ask for it in the review phase?

1

Pomdapi113 t1_iy612tc wrote

I am asked to develop a classifier which can map vectors according to its class. I was told we basically must implement this formula. I will be using python. I have watch many videos on bayes classifiers but I am still struggling with this formula. Can someone please explain this to me and the prior steps to implement it, knowing that I have a training data set and test data set? This formula was titled "log likelihood". I believe it is for calculating the error rate of the classifier one implemented, so please let me know how I should actually implement the classifier from the bayes theorem.

picture of the formula

1

I-am_Sleepy t1_iy7xfo0 wrote

The basic idea of log likelihood is

  1. Assume data is generated from parameterized distribution x ~ p(x| z)
  2. Let X be a set of {x1, x2, …, xi} ~ p(x|z). Because each item is generated independently, to generate this dataset, the probability becomes p(X|z) = p(x1|z) * p(x2|z) * … * p(xi|z)
  3. Best fit z will maximize the above formula, but because multiplication can cause numerical inaccuracy, we apply a monotonic function as it won’t change the optimum point. Thus we get log(p(X|z)) = sum [log p(xi|z)]
  4. Using traditional optimization paradigm, we want to minimize the cost function, thus we need to multiply the formula by -1. Then we arrived at Negative Log Likelihood i.e. Optimize for -log(p(X|z))

Your formula estimate the p distribution as a gaussian, which is parameterized by mu and sigma. Usually initialized as zero vector and identity matrix

Using standard autograd, you can then optimize for those parmateters iteratively. But other optimization method is also possible depends on your preference such as genetic algorithm, or bayesian optimization

For bayesian, if your prior is normal, then its conjugate prior is also normal. For multivariate, it is a bit trickier, depends on your settings (likelihood distribution) you can lookup here. You need to look into Interpretation of hyperparameters columns to understand it better, and/or maybe here too

1

nwatab t1_iy75dep wrote

I was training 10GB dataset on AWS ec2 (AMI: Deep Learning AMI GPU TensorFlow 2.10.0 (Amazon Linux 2) 20221116). After half an epoch, ec2 is very slow due to lack of memory. Does anyone know why? I don't understand why "after about half an epoch (around less than 10 minutes)", it gets slow, instead of the beginning of training.

1

I-am_Sleepy t1_iy7dqu4 wrote

I am not sure, but maybe the read data is cached? Try disable that first or maybe there is memory leak code somewhere

If your data is a single large file, it will try to read entire tensor first, before load into memory. So if it is too large, try implement your dataset as a generator (batching), or speed up preprocessing time by save the processed input as protobuff files

But single large file dataset shouldn’t slowdown at half epoch, so that is up to debate I guess

1

nwatab t1_iy7yfr8 wrote

Thanks. My data is one CSV and a lot of jpgs. I'm using tf.data input pipelines. .cache() could cause a problem according to your insights. I'll check them.

1

nwatab t1_iy8bssy wrote

Yes, it was cache that caused a problem. Now it works good. Somehow it didn't come up to me. Thanks!

1

Hgat t1_iy7sqjs wrote

Are there any mechanical turk alternatives for data collection?

1

Ashkiiiii t1_iy82d3x wrote

How can I train a single LSTM model with multiple datasets.

I have 1000 datasets of many devices eg: device1.csv.....deviceN.csv. I cannot merge them together because of varying values and time component although they share the same features.

Each dataset has device voltage with respect to its age. I want to train one LSTM model with all the datasets. Should I train in for loop?

1

Hckerman-18 t1_iyade9o wrote

Thoughts on parameter optimisation.

I'm currently trying out building a pacman game using MDP - I've built all the functions but I'm struggling to get a consistent good score (above 2000). Adjusting my point system is the only way to get a difference in scores. The parameters/variables i'm mentioning include, Capsules, food, ghosts (scarred or not), the distance from the ghost) - so something like food = 10 points, capsules = 20 points etc.

Instead of mindlessly going back and forth changing parameters and testing them out. Is there a way I can use machine learning to provide the best combination of parameters? I've looked into using the gym package as a start but I was wondering if anyone had any other idea they can suggest.

1