ForceBru

ForceBru t1_jae3ugb wrote

Basically, k-means is an algorithm. You give it data and the number of clusters you want to find in the data. It finds these clusters and returns their centers (known as centroids) and possibly assigns each data point to a cluster. "Optimizing" a k-means algorithm doesn't make much sense, IMO. What you probably want to say is that you ran the algorithm and got some centroids.

If you run k-means with new data but tell it to use particular centroids (that you got from a previous run of k-means), then it'll use these centroids as starting points and update them to match the new data.

  1. Feed the algorithm some data.
  2. It internally chooses initial centroids. How to choose these very first centroids isn't a simple problem. They're usually chosen "randomly". For example, you can pick K distinct points from your dataset.
  3. K-means then does its thing and adjusts these initial centroids to best fit your data. This happens in several iterations.
  4. Finally, these adjusted centroids are returned.
  5. Now you put in new data and the centroids from the previous step.
    1. If the number of iterations is zero, there's nothing to be done, so the centroids remain unchanged.
    2. If the number of iterations is greater than zero, K-means performs these iterations and adjusts these centroids to better fit the new data.
  6. The new, potentially adjusted centroids are returned.

Basically, k-means will adjust the centroids you give it in such a way that these centroids define clusters that describe your data well enough. When you run k-means with no centroids, it'll generate random centroids for you.

3

ForceBru t1_jaduboq wrote

What do you mean by "trained k-means algorithm"? K-means is an algorithm, there's nothing to "train" there. I guess you could fine-tune the number of iterations and the number of clusters somehow. Is this what you mean?

What do you mean by "training seeds"? Are these cluster centroids obtained after clustering training data?

2

ForceBru t1_j3gfgvo wrote

> Read up on boosting.

What could a good reading list look like? I read the original papers which introduced functional gradient descent (the theoretical underpinning of boosting), but I can't say they shed much light on these techniques for me.

Is there more recommended reading to study boosting? Anything more recent, maybe? Any textbook treatments?

1

ForceBru t1_iyxs8wm wrote

You should probably start with basic time-series models like ARIMA, its seasonal version (seasonality should be particularly important for electricity forecasting) and maybe exponential smoothing.

When looking for research about time-series forecasting, I somewhat often stumble upon these basic methods perform well for electricity forecasting. I can't cite any particular papers since electricity forecasting is not my area of research, but I do feel like these methods are often discussed in the context of electricity forecasting specifically. I'm not sure whether this is a general trend though.

Anyway, in time-series analysis, it's often beneficial to try the traditional models first and only then reach for machine learning. Looks like ARIMA-like models perform fairly well in many cases, so there may be no need for any complicated ML.

3

ForceBru t1_iyccg38 wrote

This is very nice, but the page seems to load all papers at once which makes loading super slow for me. I had to wait for a full minute for the page to unfreeze. Maybe this could be improved by loading papers in batches, like social networks do with their infinite feeds.

3

ForceBru t1_iso72mh wrote

You don't even need a GPU to get into machine learning. You can do clustering, SVM, basic neural networks and a lot of other stuff on a CPU and be completely fine.

6