Big data, Japan and education

I like data. And I like data when it is big.

The Ministry of Education, Sports, Culture and Science and Technology (MEXT) in Japan announced that it will promote the use of big data. According to a source quoted in an article in today’s Japan News only 6.8 percent of 1,100 companies surveyed said they utilise big data. And 40 percent of those companies that use big data see developing human resources for this an issue.

Japan lags behind other countries in utilising big data even though it is an ideal country for it being one of the most connected countries in the world.

Why we should care about literacy?

While you are reading this you should think about how effortlessly you are doing so. And by being able to you are have (I hope) learnt something valuable. At least we, as human beings, have connected.

According to Derrida, writing is marked by absence. what he means by this is that the containers we call words do not really have a stable and full meaning. Saussure pointed out the arbitrariness of the signifier and the signified to mean as much.

Nonetheless words have meaning. Otherwise, all that we say, all that try to convey with words would be useless and empty gestures.

Writing serves as memory. Before things were committed to paper (velum or whatever other material) we learnt things by rote, things were committed to memory. The Buddha, Jesus and Socrates all left nothing in writing, but those who followed did write about them, for better or worse. Whichever way, writing is important. And printing perfected reiteratability. Without writing the internet perhaps would be relying on images alone. And while a picture may be worth a thousand words I wouldn’t trade it in for a single one of these words here.

No, literacy empowers. How else would I have a chance to know about the history of China, the philosophy of Kant and Wittgenstein, read the latest news, know that the dinner is in the microwave oven, or that today is International Literacy Day.

In the reference corpus, we trust

People will always ask (and rightly so) how can we trust a corpus to be representative of the language we are studying. The answer is we can’t. But we can make sure it is as unbiased as possible but carefully setting criteria which will ensure at least it is reproducible and somewhat representative.

Take the British National Corpus (BNC), for example.

It was by and large built in the 1980s. It is 100 million tokens (words) in size, 90 million of those tokens written and the remaining 10 million spoken language. The samples were taken from as wide a variety as possible. In my opinion it is a representative sample for almost all the words we want to investigate. It would be impossible to say it is representative of all words. The words which are not representative are small in number as well as low in frequency.

And perhaps because of their low frequency they readily become unrepresentative. A word which does not occur often (less than 1 in one-million occurrences) will necessarily mean they are not across all genres. Also small changes in their frequency will make them standout as different to higher frequency words (more are needed to affect its size). So these unrepresentative low frequency words really do not affect the overall corpus as much as people sometimes think.

Who is the next Natsume Soseki? Scholarships and the Japanese people

This year is the 100th anniversary of the death of the Japanese novelist, Natsume Soseki.

Until recently he had been featured on the Japanese one-thousand yen banknote (about USD10). He had studied in London on government scholarship for two years from 1900. This month the Soseki Museum in London privately run by Ikuo Tsunematsu, a scholar, will close this month on September 28th.

1000_yen_Natsume_SosekiIn 1999 I came on a Japanese government scholarship to Japan to study Japanese Literature. There is nothing better than being given the opportunity to learn. I had always wondered why the Japanese government spent so much money on foreign exchange students like me but gave next to nothing to its own citizens. They should be giving out scholarships like they did during the Meiji Period (1868-1912) to people like Soseki. They should be making the next Soseki, Kafu and Ogai instead of doling out to others who may not stay in Japan, but they don’t. Japan really has a lot to lose by not nurturing its own talents.