Archives for category: Language

In my experience, I notice the more I learn the more I learn. In a less abstract way, I notice that the more I take in as experience the more I notice it come up in daily life; which makes sense due to probability. Let me give you an example of a nice coincidence that happened to me today.

One of my favorite Japanese movies is Onmyouji. It’s the telling of a small portion of the life of Abe no Seimei, and court magician/astrologer/priest guy. I put this movie on today because in a past post I was complaining about my lack of commitment to Japanese. As I put the movie on, I remembered a nice little verb in the Japanese language that is used to help people not give up on their current objective. It’s nothing fancy like a parable; it’s just “don’t give up.” Anywho, I started looking for printable images to hang up on the wall to keep me motivated and came across a nice one that said “don’t give up, don’t rush, and don’t over-do it” (my poor translation). I thought it was great and looked very Japanese, so I printed it up. As I was printing it a specific scene came on in the movie that perfectly depicted this exact saying.

Abe no Seimei was summoned to release a demon out of a gourd in the courtyard. The gourd was implied to have been put there by a curse or maybe it was an omen. After Abe no Seimei had materialized the curse/omen into a snake, he set it down and let it start slithering away. It slithered very slowly and Abe no Seimei knew it would take a while to follow it to where it was going to go, but he didn’t give up, he didn’t get impatient, and he didn’t try to hurry along the snake. There was dialog during this portion with another peer of his, but I thought just the walk itself was a powerful scene, and the actor playing Abe no Seimei (野村 萬斎 Nomura Mansai) displayed how those three qualities were inherent in the character.

Here’s the movie: http://www.imdb.com/title/tt0355857/

Here’s some info on Abe no Seimei: http://en.wikipedia.org/wiki/Abe_no_Seimei

The verb-form for don’t give up is あきらめない (akiramenai)

the link to the printable image is: http://userdisk.webry.biglobe.ne.jp/006/173/05/N000/000/000/P3110152.jpg

Python ships with the default encoding of ASCII which causes a lot of headaches when needing to work with CJK characters. Furthermore, with languages that don’t have spacing, like Chinese and Japanese, working with strings can be a pain for the budding Pythonist (like myself). Here, I go over how to set up Python for UTF-8 in Ubuntu (11.10 64-bit) and I’ll go over a crude way to break a spaceless Japanese sentence up and toss it into a list.

First things first, we need to edit our site.py file to change the default encoding to UTF-8. This file was found, on my machine, at /usr/lib/python2.7/ and the change that needs to be made is the encoding from ascii to utf-8:

def setencoding():
“””Set the string encoding used by the Unicode implementation.  The
default is ‘ascii’, but if you’re willing to experiment, you can
change this.”””
encoding = “utf-8” # Default value set by _PyUnicode_Init()

Now if we check our getdefaultencoding after firing up Python, we should see this:

>>> sys.getdefaultencoding()
‘utf-8’

Great, now let’s start working with some Japanese. Let’s grab some Grade 1 Kanji from http://www.saiga-jp.com/language/kanji_list.html and see if we can change the string of Kanji into a list.

>>> grade1 = ‘一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石赤千川先早草足村大男竹中虫町天田土二日入年白八百文木本名目立力林六’

Now let’s decode it to utf-8 and space each Kanji with the handy join function!

>>> grade1_dec = grade1.decode(‘utf-8’)
>>> print grade1_dec
一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石赤千川先早草足村大男竹中虫町天田土二日入年白八百文木本名目立力林六

>>> grade1_spaced = ‘ ‘.join(grade1_dec)
>>> print grade1_spaced
一 右 雨 円 王 音 下 火 花 貝 学 気 九 休 玉 金 空 月 犬 見 五 口 校 左 三 山 子 四 糸 字 耳 七 車 手 十 出 女 小 上 森 人 水 正 生 青 夕 石 赤 千 川 先 早 草 足 村 大 男 竹 中 虫 町 天 田 土 二 日 入 年 白 八 百 文 木 本 名 目 立 力 林 六

Whoa, this is getting quite nifty, and it’s barely taken any effort. This is why I love Python! Let’s now make a list out of this bad boy with the split function and check a few indexes.

>>> grade1_list = grade1_spaced.split(‘ ‘)
>>> print grade1_list
[u’\u4e00′, u’\u53f3′, u’\u96e8′, u’\u5186′, u’\u738b’, u’\u97f3′, u’\u4e0b’, u’\u706b’, u’\u82b1′, u’\u8c9d’, u’\u5b66′, u’\u6c17′, u’\u4e5d’, u’\u4f11′, u’\u7389′, u’\u91d1′, u’\u7a7a’, u’\u6708′, u’\u72ac’, u’\u898b’, u’\u4e94′, u’\u53e3′, u’\u6821′, u’\u5de6′, u’\u4e09′, u’\u5c71′, u’\u5b50′, u’\u56db’, u’\u7cf8′, u’\u5b57′, u’\u8033′, u’\u4e03′, u’\u8eca’, u’\u624b’, u’\u5341′, u’\u51fa’, u’\u5973′, u’\u5c0f’, u’\u4e0a’, u’\u68ee’, u’\u4eba’, u’\u6c34′, u’\u6b63′, u’\u751f’, u’\u9752′, u’\u5915′, u’\u77f3′, u’\u8d64′, u’\u5343′, u’\u5ddd’, u’\u5148′, u’\u65e9′, u’\u8349′, u’\u8db3′, u’\u6751′, u’\u5927′, u’\u7537′, u’\u7af9′, u’\u4e2d’, u’\u866b’, u’\u753a’, u’\u5929′, u’\u7530′, u’\u571f’, u’\u4e8c’, u’\u65e5′, u’\u5165′, u’\u5e74′, u’\u767d’, u’\u516b’, u’\u767e’, u’\u6587′, u’\u6728′, u’\u672c’, u’\u540d’, u’\u76ee’, u’\u7acb’, u’\u529b’, u’\u6797′, u’\u516d’]
>>> print grade1_list[3]

>>> print grade1_list[60]

>>> print grade1_list[79]

>>> for i in grade1_list[34:37]:
…     print i



Now that we got a nice list going on, let’s give this a final hurrah by checking if Kanji are in the list.

>>> ‘見’ in grade1_list
True
>>> ‘絵’ in grade1_list
False
>>> ‘花’ in grade1_list
True

The possibilities are endless from here, and I hope this excites people enough to want to start messing around with other languages in Python (or in programming in general). There is a certain stigma associated with encodings, so I hope this clears up some of the Python confusion.

Design a site like this with WordPress.com
Get started