I like data. And I like data when it is big.
The Ministry of Education, Sports, Culture and Science and Technology (MEXT) in Japan announced that it will promote the use of big data. According to a source quoted in an article in today’s Japan News only 6.8 percent of 1,100 companies surveyed said they utilise big data. And 40 percent of those companies that use big data see developing human resources for this an issue.
Japan lags behind other countries in utilising big data even though it is an ideal country for it being one of the most connected countries in the world.
While you are reading this you should think about how effortlessly you are doing so. And by being able to you are have (I hope) learnt something valuable. At least we, as human beings, have connected.
According to Derrida, writing is marked by absence. what he means by this is that the containers we call words do not really have a stable and full meaning. Saussure pointed out the arbitrariness of the signifier and the signified to mean as much.
Nonetheless words have meaning. Otherwise, all that we say, all that try to convey with words would be useless and empty gestures.
Writing serves as memory. Before things were committed to paper (velum or whatever other material) we learnt things by rote, things were committed to memory. The Buddha, Jesus and Socrates all left nothing in writing, but those who followed did write about them, for better or worse. Whichever way, writing is important. And printing perfected reiteratability. Without writing the internet perhaps would be relying on images alone. And while a picture may be worth a thousand words I wouldn’t trade it in for a single one of these words here.
No, literacy empowers. How else would I have a chance to know about the history of China, the philosophy of Kant and Wittgenstein, read the latest news, know that the dinner is in the microwave oven, or that today is International Literacy Day.
People will always ask (and rightly so) how can we trust a corpus to be representative of the language we are studying. The answer is we can’t. But we can make sure it is as unbiased as possible but carefully setting criteria which will ensure at least it is reproducible and somewhat representative.
Take the British National Corpus (BNC), for example.
It was by and large built in the 1980s. It is 100 million tokens (words) in size, 90 million of those tokens written and the remaining 10 million spoken language. The samples were taken from as wide a variety as possible. In my opinion it is a representative sample for almost all the words we want to investigate. It would be impossible to say it is representative of all words. The words which are not representative are small in number as well as low in frequency.
And perhaps because of their low frequency they readily become unrepresentative. A word which does not occur often (less than 1 in one-million occurrences) will necessarily mean they are not across all genres. Also small changes in their frequency will make them standout as different to higher frequency words (more are needed to affect its size). So these unrepresentative low frequency words really do not affect the overall corpus as much as people sometimes think.
This year is the 100th anniversary of the death of the Japanese novelist, Natsume Soseki.
Until recently he had been featured on the Japanese one-thousand yen banknote (about USD10). He had studied in London on government scholarship for two years from 1900. This month the Soseki Museum in London privately run by Ikuo Tsunematsu, a scholar, will close this month on September 28th.
In 1999 I came on a Japanese government scholarship to Japan to study Japanese Literature. There is nothing better than being given the opportunity to learn. I had always wondered why the Japanese government spent so much money on foreign exchange students like me but gave next to nothing to its own citizens. They should be giving out scholarships like they did during the Meiji Period (1868-1912) to people like Soseki. They should be making the next Soseki, Kafu and Ogai instead of doling out to others who may not stay in Japan, but they don’t. Japan really has a lot to lose by not nurturing its own talents.
Did you know you can type phonetic symbols without needing to install fonts onto your computer’s operating system or word processor? Simply go to ipa.typeit.org and enter the symbols you cannot get normally. Have fun. ;)
Just how dominant is Google as a search engine portal?
Let’s just say of the 221,721 search referrals to my blog over a nine year period 95% are from Google, 2% from Bing, and 1% from Yahoo!. The remaining two percent is made up of various smaller search engines.
While the true market share of the search engines are 72% for Google, 10% for Bing and 7% for Yahoo! it just goes to show if you are small fry like me it still pays to be better known in the main search engine.
Either way Google still is dominant beyond belief.
There isn’t a day that each and everyone for us doesn’t use language in some way. We need it to communicate and interact with people. Unless you live by yourself in a remote forest or island we will use language.
Languages are not made equal. What I mean by this is that languages, like everything else, follow patterns. Some language patterns are more common than others. SOV (subject-object-verb) and SVO (subject-verb-object) are the two most common sentence patterns across languages. Together they make up about 90 percent of all language types. The remaining four possible patterns (OVS, OSV, VSO and VOS) make up the other 10 percent.
Having the subject come first makes sense since it is the most important part of the sentence – what the sentence is about. The verb – what the subject is doing – then should come next. I stress should because SOV is actually the slightly more common type. By enclosing the object maybe just as effective, then.
Within the mind we tend to think of things as universal or generic without relating it to the wider world. We say things like, “the sun rises from the east”, without seeing it in context that which it occurs. We probably even have a perfect literally unclouded image of a singular sunrise that represents all sunrises in our heads.
But the sunrises from the east with a frequency and regularity that is often not taken in account when it should be. It rises once a day. Or to be more precise the earth, covered in an protective “lubricating” atmosphere, turns once a day to give the illusion of the sun rising. We are so easily duped and we’re duped on a daily basis by all kinds of illusions.
The reliability of this event like all other events is what gives us our understanding and our rhythm. We often choose to have a rhythm in order to have a regularity to help us through the day. So in this sense frequency is something important. It may be everything.
As I get older things are no longer a singular mental object but repeated objects with a certain frequency. Understanding that frequency is what gives sense to the world. Otherwise there are only perfect mental objects, which is not true at all.
Yes, frequency is everything.
The Japanese language is considered syntactically a Subject-Object-Verb or SOV language in contrast to English which is considered a subject-Verb-Object or SVO language, as these two example sentences will show.
(1) Ken wa (S) tama wo (O) uchimashita (V).
(2) Ken (S) hit (V) the ball (O).
While it is not possible to move the syntactical elements around in English without a changing its meaning, it is possible in Japanese. Why this is so is due partly to particles (助詞). Particles mark the syntactic role of the word or phrase before it. By doing so this means the entire phrase including the particle can move to any other position within a sentence without losing its marked role.
The ‘wa’ and ‘wo’ in (1) are particles.
The English syntactic elements, however, are not marked whatsoever by particles (particles do not exist in English) and only show their syntactic distinction to other elements within the sentence unit by its relative position to each other. The sentence is therefore the unit. The rearranged syntactic units of (3) below in contrast to (2) has a now a completely different meaning because of the changed positions of the subject (S) and object (O).
(3) The ball (S) hit (V) Ken (O).
So Japanese is considered an SOV language because most often the elements follow this order and not because it is fixed by its position like English. But English learners of Japanese can safely assume this structure for learning purposes.