问题描述:

I read the paper k-means++: The Advantages of Careful Seeding and didn't quite understand the algorithm provided which is:

"Let D(x) denote the shortest distance from a data point x to the closest center we have already chosen.

1a. Choose an initial center c1 uniformly at random

from X.

1b. Choose the next center ci, selecting ci = x' ∈ X with probability (D(x')^2) / Sum_of(D(x)^2)

1c. Repeat Step 1b until we have chosen a total of k centers.

2-4. Proceed as with the standard k-means algorithm"

(Better look at the algorithm in the link above)

Especially step 1b. What do they mean by "selecting ci = x' ∈ X with probability (D(x')^2) / Sumof(D(x)^2)". Do they mean selecting the element that has the largest proportion ? And how can perform such computation can lead to select the optimal centroids ?

The function **D(x)** is defined for all points x ∈ X.

In step 1b, we must choose some point to be a new center. We will choose randomly between all points (that are not already centers). But we will not give an equal chance to every point; we will assign different probabilities to different points before we choose. These probabilities must add up to 1.

Consider D(x)^2. We can evaluate this at every point, and add up the values: Sum_of(D(x)^2).

Then we can assign each point x' a probability equal to D(x')^2 / Sum_of(D(x)^2). These probabilities add up to 1, and give a better chance to points far away from all existing centers.

http://en.wikipedia.org/wiki/K-means%2B%2B

```
1. Choose one center uniformly at random from among the data points.
2. For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
3. Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(x)2.
4. Repeat Steps 2 and 3 until k centers have been chosen.
5. Now that the initial centers have been chosen, proceed using standard k-means clustering.
```

您可能感兴趣的文章：

- c# - Visual Studio Modifications to a Project not Updating the .exe File in \bin
- asp.net - Control margins and page width in Orchard CMS
- html - IE8 Select Dropdown list won't display if longer than screen
- android - capture photo using front camera
- windows 8 - How to create a nice add button to list view like in the Weather app?
- php - Reading a specific line from a text file
- clr - Any implementation of an Unrolled Linked List in C#?
- Finding Hudson Log Files
- Forward to a payment-gateway together with POST data using cURL (or any other PHP server side solution)
- WCF in Winforms app - is it always single-threaded?

随机阅读：

**推荐内容**-

**热点内容**-
- php - Reading a specific line from a text file
- clr - Any implementation of an Unrolled Linked List in C#?
- Finding Hudson Log Files
- Forward to a payment-gateway together with POST data using cURL (or any other PHP server side solution)
- WCF in Winforms app - is it always single-threaded?
- git svn - git svn fetch does not fetch a Subversion commit message modified after initial clone
- java me - Why I am getting the bad length exception when I am running this application?
- java - How to get string.format to complain at compile time
- ruby on rails - Trigger observer of parent class on change
- python - Issue with URL pattern in Django with webmonkey tutorial