Big data, Japan and education

I like data. And I like data when it is big.

The Ministry of Education, Sports, Culture and Science and Technology (MEXT) in Japan announced that it will promote the use of big data. According to a source quoted in an article in today’s Japan News only 6.8 percent of 1,100 companies surveyed said they utilise big data. And 40 percent of those companies that use big data see developing human resources for this an issue.

Japan lags behind other countries in utilising big data even though it is an ideal country for it being one of the most connected countries in the world.

Why we should care about literacy?

While you are reading this you should think about how effortlessly you are doing so. And by being able to you are have (I hope) learnt something valuable. At least we, as human beings, have connected.

According to Derrida, writing is marked by absence. what he means by this is that the containers we call words do not really have a stable and full meaning. Saussure pointed out the arbitrariness of the signifier and the signified to mean as much.

Nonetheless words have meaning. Otherwise, all that we say, all that try to convey with words would be useless and empty gestures.

Writing serves as memory. Before things were committed to paper (velum or whatever other material) we learnt things by rote, things were committed to memory. The Buddha, Jesus and Socrates all left nothing in writing, but those who followed did write about them, for better or worse. Whichever way, writing is important. And printing perfected reiteratability. Without writing the internet perhaps would be relying on images alone. And while a picture may be worth a thousand words I wouldn’t trade it in for a single one of these words here.

No, literacy empowers. How else would I have a chance to know about the history of China, the philosophy of Kant and Wittgenstein, read the latest news, know that the dinner is in the microwave oven, or that today is International Literacy Day.

In the reference corpus, we trust

People will always ask (and rightly so) how can we trust a corpus to be representative of the language we are studying. The answer is we can’t. But we can make sure it is as unbiased as possible but carefully setting criteria which will ensure at least it is reproducible and somewhat representative.

Take the British National Corpus (BNC), for example.

It was by and large built in the 1980s. It is 100 million tokens (words) in size, 90 million of those tokens written and the remaining 10 million spoken language. The samples were taken from as wide a variety as possible. In my opinion it is a representative sample for almost all the words we want to investigate. It would be impossible to say it is representative of all words. The words which are not representative are small in number as well as low in frequency.

And perhaps because of their low frequency they readily become unrepresentative. A word which does not occur often (less than 1 in one-million occurrences) will necessarily mean they are not across all genres. Also small changes in their frequency will make them standout as different to higher frequency words (more are needed to affect its size). So these unrepresentative low frequency words really do not affect the overall corpus as much as people sometimes think.

Who is the next Natsume Soseki? Scholarships and the Japanese people

This year is the 100th anniversary of the death of the Japanese novelist, Natsume Soseki.

Until recently he had been featured on the Japanese one-thousand yen banknote (about USD10). He had studied in London on government scholarship for two years from 1900. This month the Soseki Museum in London privately run by Ikuo Tsunematsu, a scholar, will close this month on September 28th.

1000_yen_Natsume_SosekiIn 1999 I came on a Japanese government scholarship to Japan to study Japanese Literature. There is nothing better than being given the opportunity to learn. I had always wondered why the Japanese government spent so much money on foreign exchange students like me but gave next to nothing to its own citizens. They should be giving out scholarships like they did during the Meiji Period (1868-1912) to people like Soseki. They should be making the next Soseki, Kafu and Ogai instead of doling out to others who may not stay in Japan, but they don’t. Japan really has a lot to lose by not nurturing its own talents.

International Phonetic Alphabet symbols online keyboard

Did you know you can type phonetic symbols without needing to install fonts onto your computer’s operating system or word processor? Simply go to ipa.typeit.org and enter the symbols you cannot get normally. Have fun. ;)

Talk about search engine dominance

Just how dominant is Google as a search engine portal?

Let’s just say of the 221,721 search referrals to my blog over a nine year period 95% are from Google, 2% from Bing, and 1% from Yahoo!. The remaining two percent is made up of various smaller search engines.

Screenshot 2016-08-13 09.26.12

While the true market share of the search engines are 72% for Google, 10% for Bing and 7%  for Yahoo! it just goes to show if you are small fry like me it still pays to be better known in the main search engine.

Screenshot 2016-08-13 09.22.36.png

Either way Google still is dominant beyond belief.

What I have learnt from linguistics

There isn’t a day that each and everyone for us doesn’t use language in some way. We need it to communicate and interact with people. Unless you live by yourself in a remote forest or island we will use language.

Languages are not made equal. What I mean by this is that languages, like everything else, follow patterns. Some language patterns are more common than others. SOV (subject-object-verb) and SVO (subject-verb-object) are the two most common sentence patterns across languages. Together they make up about 90 percent of all language types. The remaining four possible patterns (OVS, OSV, VSO and VOS) make up the other 10 percent.

Having the subject come first makes sense since it is the most important part of the sentence – what the sentence is about. The verb – what the subject is doing – then should come next. I stress should because SOV is actually the slightly more common type. By enclosing the object maybe just as effective, then.

Continue reading

Frequency is everything

Within the mind we tend to think of things as universal or generic without relating it to the wider world. We say things like, “the sun rises from the east”, without seeing it in context that which it occurs. We probably even have a perfect literally unclouded image of a singular sunrise that represents all sunrises in our heads.

But the sunrises from the east with a frequency and regularity that is often not taken in account when it should be. It rises once a day. Or to be more precise the earth, covered in an protective “lubricating” atmosphere, turns once a day to give the illusion of the sun rising. We are so easily duped and we’re duped on a daily basis by all kinds of illusions.

The reliability of this event like all other events is what gives us our understanding and our rhythm. We often choose to have a rhythm in order to have a regularity to help us through the day. So in this sense frequency is something important. It may be everything.

As I get older things are no longer a singular mental object but repeated objects with a certain frequency. Understanding that frequency is what gives sense to the world. Otherwise there are only perfect mental objects, which is not true at all.

Yes, frequency is everything.

A quick introduction to Japanese syntax and particles

The Japanese language is considered syntactically a Subject-Object-Verb or SOV language in contrast to English which is considered a subject-Verb-Object or SVO language, as these two example sentences will show.

(1) Ken wa (S) tama wo (O) uchimashita (V).
(2) Ken (S) hit (V) the ball (O).

While it is not possible to move the syntactical elements around in English without a changing its meaning, it is possible in Japanese. Why this is so is due partly to particles (助詞). Particles mark the syntactic role of the word or phrase before it. By doing so this means the entire phrase including the particle can move to any other position within a sentence without losing its marked role.

The ‘wa’ and ‘wo’ in (1) are particles.

The English syntactic elements, however, are not marked whatsoever by particles (particles do not exist in English) and only show their syntactic distinction to other elements within the sentence unit by its relative position to each other. The sentence is therefore the unit. The rearranged syntactic units of (3) below in contrast to (2) has a now a completely different meaning because of the changed positions of the subject (S) and object (O).

(3) The ball (S) hit (V) Ken (O).

So Japanese is considered an SOV language because most often the elements follow this order and not because it is fixed by its position like English. But English learners of Japanese can safely assume this structure for learning purposes.

文法用語の最低限語彙

私の考え方では文法用語は大事なのに十分教えていると思わない。または教えてもただ日本語訳だけ与える。それは教えていると言えない。

全て知るも必要ではない。幾つかの役に立つ用語だけで良いと思う。どれが役に立つだろう?そして幾つの用語が必要だろう?私にとってはたったの13だけで十分と思う。二種類で一つの種類は8、もう一つの種類は5。これです。一種類目は

  • 名詞
  • 動詞
  • 形容詞
  • 副詞
  • 代名詞
  • 冠詞
  • 前置詞
  • 接続詞

と二種類目

  • 主語
  • 動詞
  • 補語
  • 目的語
  • 副詞句

気づいてほしいのは【動詞】が両方の種類に入っていること。その理由は用語では動詞少なくても二つの意味がある。なぜこういう風に種類を二つに分ければとわかってたら動詞の二つの意味も覚えるだろう。

一つ目の種類は「語彙の変化仕方」タイプと考えれば良いだ。もう一つの種類は「文の中の役割」タイプと考えれば良いだ。

語彙を辞書に引くと一つ目の用語しか出てこない。なぜかというと名詞は主語のところでも、目的語のところでも(文の中の役割関係なく)同じ名詞のだ。このワンポイント覚えておきましょう。