What is K-mean?
What is CSV?
CSV is a data storage format in plain text. This can be generate from excel, google form, etc. It's also easy to be applied in different languages, since it's simple syntax.
What Am I Writing Today?
I am writing a program can read the table from a CSV file which may be generated by excel or google drive form, and apply the k-mean algorithm. At last, it output the clusters as result showing the items in different clusters, also draw the points on the screen.
#!/usr/bin/python # This program attend to read data from a csv file, # and apply kmean, then output the result. from pylab import plot,show from numpy import vstack,array from numpy.random import rand from scipy.cluster.vq import kmeans, vq, whiten import csv if __name__ == "__main__": # clusters K = 3 data_arr = [] meal_name_arr = [] with open('meals2.csv', 'rb') as f: reader = csv.reader(f) for row in reader: data_arr.append([float(x) for x in row[1:]]) meal_name_arr.append([row[0]]) data = vstack( data_arr ) meal_name = vstack(meal_name_arr) # normalization data = whiten(data) # computing K-Means with K (clusters) centroids, distortion = kmeans(data,3) print "distortion = " + str(distortion) # assign each sample to a cluster idx,_ = vq(data,centroids) # some plotting using numpy's logical indexing plot(data[idx==0,0], data[idx==0,1],'ob', data[idx==1,0], data[idx==1,1],'or', data[idx==2,0], data[idx==2,1],'og') print meal_name print data for i in range(K): result_names = meal_name[idx==i, 0] print "=================================" print "Cluster " + str(i+1) for name in result_names: print name plot(centroids[:,0], centroids[:,1], 'sg',markersize=8) show()
No comments:
Post a Comment