Thursday, April 18, 2013

Why VIM?

" Vim不是「很難用的編輯器」,只是你不夠瞭解他 "


  1. 命令模式:每個按鍵變成指令,可在短時間內編輯你想要的東西,例如:「在行尾加句點(2 keys)」、「大小寫互換(1 keys)」、「複製目前這行九遍(4 keys)」、...。而做反覆的事情只需要加上數字便可以表示做幾次,另外甚至可以寫一整串的指令。
  2. 客制化:透過撰寫.vimrc這份檔案,可以把vim改成自己喜好的樣式,例如:「設定compile and run的快捷鍵」、「設定android開發時需要的指令成快捷鍵」、...
  3. 結合shell:Vim可以簡易的在編輯器中輸入外部系統指令,也可以讓外部容易的連上Vim編輯文件,相輔之下,讓人在寫程式快速穿梭在系統和文字編輯之間。



  1. 連續按左箭頭到句首(或是用滑鼠小心點擊)
  2. 按delete刪掉本來的小寫字元
  3. 輸入大寫字元
  4. 再連續按右箭頭到句尾(或是移動手改用滑鼠)
  5. 按'.'輸入句點
  1. 輸入:「^~A.」即可


Saturday, April 6, 2013

Web Image Grabbing Robot Using Google Image API

This time, I am facing a problem to grab corresponding images from a keyword list. Most of the time, we do Google Image Search, and find out the image we want. However, this will be inefficient when it comes to few hundreds of keywords.

Therefore, the program I wrote here automatic read keywords from the list, and call the Google Image API to find out the first three images relating to this keyword. The images are showed in URL. The reason for showing the first three images is that not all the time Google give out the one we want in the first element of the results. I've also written a website in PHP which allow people to select the image from these three images, and automatically grab the one which is selected to the server. As a result, you only have to check through which image you want to store in the end, you don't have to do any search or any downloading.

Python Program:
import urllib2
import simplejson

keyword = raw_input("Image Search >> ")
keyword_encoded = urllib2.quote(keyword, '')

url = ('' +
       'v=1.0&q=' + keyword_encoded + '&userip=IP-INSERT-HERE-HERE')

request  = urllib2.Request(url, None, {'Referer': ''})
response = urllib2.urlopen(request)

results = simplejson.load(response)

for i in range(3):
 print results['responseData']['results'][i]['url']

Wednesday, April 3, 2013

Applying K-mean on CSV files using Python

What is K-mean?

K-mean is an easy to clustering the data, which knowing their features already. We call the input data entities as "observation", and the output groups as "cluster". Today, k-means is working for labeling n the observations into k clusters.

What is CSV?

CSV is a data storage format in plain text. This can be generate from excel, google form, etc. It's also easy to be applied in different languages, since it's simple syntax.

What Am I Writing Today?

I am writing a program can read the table from a CSV file which may be generated by excel or google drive form, and apply the k-mean algorithm. At last, it output the clusters as result showing the items in different clusters, also draw the points on the screen.


# This program attend to read data from a csv file,
# and apply kmean, then output the result.

from pylab            import plot,show
from numpy            import vstack,array
from numpy.random     import rand
from scipy.cluster.vq import kmeans, vq, whiten

import csv

if __name__ == "__main__":

    # clusters
    K = 3

    data_arr = []
    meal_name_arr = []

    with open('meals2.csv', 'rb') as f:
        reader = csv.reader(f)
        for row in reader:
            data_arr.append([float(x) for x in row[1:]])

    data = vstack( data_arr )
    meal_name = vstack(meal_name_arr)

    # normalization
    data = whiten(data)

    # computing K-Means with K (clusters)
    centroids, distortion = kmeans(data,3)
    print "distortion = " + str(distortion)

    # assign each sample to a cluster
    idx,_ = vq(data,centroids)

    # some plotting using numpy's logical indexing
    plot(data[idx==0,0], data[idx==0,1],'ob',
         data[idx==1,0], data[idx==1,1],'or',
         data[idx==2,0], data[idx==2,1],'og')

    print meal_name
    print data

    for i in range(K):
        result_names = meal_name[idx==i, 0]
        print "================================="
        print "Cluster " + str(i+1)
        for name in result_names:
            print name