apache · fjiang6 · Jan 22, 2015 · Jan 22, 2015 · Jan 23, 2015 · Jan 23, 2015
diff --git a/docs/img/PIClusteringFiveCirclesInputsAndOutputs.png b/docs/img/PIClusteringFiveCirclesInputsAndOutputs.png
diff --git a/docs/mllib-clustering-pic.md b/docs/mllib-clustering-pic.md
@@ -0,0 +1,30 @@
+---
+layout: global
+title: Clustering - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Power Iteration Clustering
+---
+
+* Table of contents
+{:toc}
+
+
+## Power Iteration Clustering
+
+Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values.  Internally the algorithm:
+
+* computes the Gaussian distance between all pairs of points and represents these distances in an Affinity Matrix
+* calculates a Normalized Affinity Matrix
+* calculates the principal eigenvalue and eigenvector
+* Clusters each of the input points according to their principal eigenvector component value
+
+Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]{www.icml2010.org/papers/387.pdf}
+
+Example outputs for a dataset inspired by the paper - but with five clusters instead of three- have he following output from our implementation:
+
+<p style="text-align: center;">
+  <img src="img/PIClusteringFiveCirclesInputsAndOutputs.png"
+       title="The Property Graph"
+       alt="The Property Graph"
+       width="50%" />
+  <!-- Images are downsized intentionally to improve quality on retina displays -->
+</p>
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
@@ -34,6 +34,9 @@ a given dataset, the algorithm returns the best clustering result).
 * *initializationSteps* determines the number of steps in the k-means\|\| algorithm.
 * *epsilon* determines the distance threshold within which we consider k-means to have converged. 
 
+[Power Iteration Clustering](mllib-clustering-pic.md) that uses the Power Iteration method combined with KMeans clustering to 
+cluster points based on a Gaussian measure of the input data pairwise similarity.
+
 ### Examples
 
 <div class="codetabs">

diff --git a/mllib/pom.xml b/mllib/pom.xml
@@ -50,6 +50,11 @@
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
     </dependency>
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-graphx_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+    </dependency>
     <dependency>
       <groupId>org.jblas</groupId>
       <artifactId>jblas</artifactId>
@@ -103,6 +108,13 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+<!--    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-graphx_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+      <type>test-jar</type>
+      <scope>test</scope>
+    </dependency> -->
   </dependencies>
   <profiles>
     <profile>