Comments

You must log in or register to comment.

ForceBru t1_jaduboq wrote

What do you mean by "trained k-means algorithm"? K-means is an algorithm, there's nothing to "train" there. I guess you could fine-tune the number of iterations and the number of clusters somehow. Is this what you mean?

What do you mean by "training seeds"? Are these cluster centroids obtained after clustering training data?

2

_throw_hawaii OP t1_jae19kp wrote

Yes, sorry you're right. I meant that the k-means was originally applied (and optimized) to an initial dataset. Then those data have been updated, but the structure of the model has to stay the same(except for some parameters in the code)

1

ForceBru t1_jae3ugb wrote

Basically, k-means is an algorithm. You give it data and the number of clusters you want to find in the data. It finds these clusters and returns their centers (known as centroids) and possibly assigns each data point to a cluster. "Optimizing" a k-means algorithm doesn't make much sense, IMO. What you probably want to say is that you ran the algorithm and got some centroids.

If you run k-means with new data but tell it to use particular centroids (that you got from a previous run of k-means), then it'll use these centroids as starting points and update them to match the new data.

  1. Feed the algorithm some data.
  2. It internally chooses initial centroids. How to choose these very first centroids isn't a simple problem. They're usually chosen "randomly". For example, you can pick K distinct points from your dataset.
  3. K-means then does its thing and adjusts these initial centroids to best fit your data. This happens in several iterations.
  4. Finally, these adjusted centroids are returned.
  5. Now you put in new data and the centroids from the previous step.
    1. If the number of iterations is zero, there's nothing to be done, so the centroids remain unchanged.
    2. If the number of iterations is greater than zero, K-means performs these iterations and adjusts these centroids to better fit the new data.
  6. The new, potentially adjusted centroids are returned.

Basically, k-means will adjust the centroids you give it in such a way that these centroids define clusters that describe your data well enough. When you run k-means with no centroids, it'll generate random centroids for you.

3

_throw_hawaii OP t1_jae9zxu wrote

Thank you so much, so clear and helpful!!!🙏🙏🙏

1

Donno_Nemore t1_jaepw86 wrote

Using known centroids should be more stable . K-means can be stochastic for starting clusters and the result for the same data can vary.

1

PredictorX1 t1_jadudqt wrote

The result of k-means clustering is a set of cluster centers. Usually, I would think "running" it over new data would mean assigning each observation in the new set to one of those clusters. I'm not sure what the rest of your question is getting at.

1

_throw_hawaii OP t1_jae20rx wrote

Yes, exactly. The maximum number of iterations is a parameter that can be usually set in some functions(in programming languages). So I was told when I had to implement the model with k-means on new data to set that number to zero

1

Donno_Nemore t1_jaeq7oy wrote

This sounds like you are being asked to assign the new data to a cluster. Assignment is as simple as calling the distance function for each pair of point and centroid. The minimum score is the cluster assignment.

1