Thursday, April 18, 2013

Why VIM?


" Vim不是「很難用的編輯器」,只是你不夠瞭解他 "

Vim中一般文字編輯器沒有功能

  1. 命令模式:每個按鍵變成指令,可在短時間內編輯你想要的東西,例如:「在行尾加句點(2 keys)」、「大小寫互換(1 keys)」、「複製目前這行九遍(4 keys)」、...。而做反覆的事情只需要加上數字便可以表示做幾次,另外甚至可以寫一整串的指令。
  2. 客制化:透過撰寫.vimrc這份檔案,可以把vim改成自己喜好的樣式,例如:「設定compile and run的快捷鍵」、「設定android開發時需要的指令成快捷鍵」、...
  3. 結合shell:Vim可以簡易的在編輯器中輸入外部系統指令,也可以讓外部容易的連上Vim編輯文件,相輔之下,讓人在寫程式快速穿梭在系統和文字編輯之間。

說服大家使用Vim的原因很多人都寫過(如上),我個人會喜歡舉一些簡單例子:

Q:欲在游標指向的這行,句首第一個字元從小寫改大寫,再在句尾加上一個句點(一個我們常常會遇到的情況)

「一般文字編輯器」方法:
  1. 連續按左箭頭到句首(或是用滑鼠小心點擊)
  2. 按delete刪掉本來的小寫字元
  3. 輸入大寫字元
  4. 再連續按右箭頭到句尾(或是移動手改用滑鼠)
  5. 按'.'輸入句點
「Vim」方法:
  1. 輸入:「^~A.」即可

最後,雖Vim需要經過學習,一開始使用也不會比較快,但學習之後可以快很多很多很多....

Saturday, April 6, 2013

Web Image Grabbing Robot Using Google Image API

This time, I am facing a problem to grab corresponding images from a keyword list. Most of the time, we do Google Image Search, and find out the image we want. However, this will be inefficient when it comes to few hundreds of keywords.

Therefore, the program I wrote here automatic read keywords from the list, and call the Google Image API to find out the first three images relating to this keyword. The images are showed in URL. The reason for showing the first three images is that not all the time Google give out the one we want in the first element of the results. I've also written a website in PHP which allow people to select the image from these three images, and automatically grab the one which is selected to the server. As a result, you only have to check through which image you want to store in the end, you don't have to do any search or any downloading.

Python Program:
#!/usr/bin/python
import urllib2
import simplejson

keyword = raw_input("Image Search >> ")
keyword_encoded = urllib2.quote(keyword, '')

url = ('https://ajax.googleapis.com/ajax/services/search/images?' +
       'v=1.0&q=' + keyword_encoded + '&userip=IP-INSERT-HERE-HERE')

request  = urllib2.Request(url, None, {'Referer': 'plate.nctucs.net'})
response = urllib2.urlopen(request)

results = simplejson.load(response)

for i in range(3):
 print results['responseData']['results'][i]['url']

Wednesday, April 3, 2013

Applying K-mean on CSV files using Python

What is K-mean?

K-mean is an easy to clustering the data, which knowing their features already. We call the input data entities as "observation", and the output groups as "cluster". Today, k-means is working for labeling n the observations into k clusters.

What is CSV?

CSV is a data storage format in plain text. This can be generate from excel, google form, etc. It's also easy to be applied in different languages, since it's simple syntax.

What Am I Writing Today?

I am writing a program can read the table from a CSV file which may be generated by excel or google drive form, and apply the k-mean algorithm. At last, it output the clusters as result showing the items in different clusters, also draw the points on the screen.


#!/usr/bin/python

# This program attend to read data from a csv file,
# and apply kmean, then output the result.

from pylab            import plot,show
from numpy            import vstack,array
from numpy.random     import rand
from scipy.cluster.vq import kmeans, vq, whiten

import csv

if __name__ == "__main__":

    # clusters
    K = 3

    data_arr = []
    meal_name_arr = []

    with open('meals2.csv', 'rb') as f:
        reader = csv.reader(f)
        for row in reader:
            data_arr.append([float(x) for x in row[1:]])
            meal_name_arr.append([row[0]])

    data = vstack( data_arr )
    meal_name = vstack(meal_name_arr)

    # normalization
    data = whiten(data)

    # computing K-Means with K (clusters)
    centroids, distortion = kmeans(data,3)
    print "distortion = " + str(distortion)

    # assign each sample to a cluster
    idx,_ = vq(data,centroids)

    # some plotting using numpy's logical indexing
    plot(data[idx==0,0], data[idx==0,1],'ob',
         data[idx==1,0], data[idx==1,1],'or',
         data[idx==2,0], data[idx==2,1],'og')

    print meal_name
    print data

    for i in range(K):
        result_names = meal_name[idx==i, 0]
        print "================================="
        print "Cluster " + str(i+1)
        for name in result_names:
            print name

    plot(centroids[:,0],
         centroids[:,1],
         'sg',markersize=8)

    show()