Big data, Japan and education

I like data. And I like data when it is big.

The Ministry of Education, Sports, Culture and Science and Technology (MEXT) in Japan announced that it will promote the use of big data. According to a source quoted in an article in today’s Japan News only 6.8 percent of 1,100 companies surveyed said they utilise big data. And 40 percent of those companies that use big data see developing human resources for this an issue.

Japan lags behind other countries in utilising big data even though it is an ideal country for it being one of the most connected countries in the world.

Talk about search engine dominance

Just how dominant is Google as a search engine portal?

Let’s just say of the 221,721 search referrals to my blog over a nine year period 95% are from Google, 2% from Bing, and 1% from Yahoo!. The remaining two percent is made up of various smaller search engines.

Screenshot 2016-08-13 09.26.12

While the true market share of the search engines are 72% for Google, 10% for Bing and 7%  for Yahoo! it just goes to show if you are small fry like me it still pays to be better known in the main search engine.

Screenshot 2016-08-13 09.22.36.png

Either way Google still is dominant beyond belief.

How many apps have I downloaded?

Apple reports that iOS apps downloaded from the App Store have now totalled 100 billion. Considering that 1 billion devices have been sold that is on average 100 apps per device. And considering that there are 1.5 million unique apps on the Store that is an average of 66,667 downloads per app.  

Personally, I have downloaded at least a thousand apps on two owned devices. So I guess I am downloading five times more than the average person. Yikes.

Keywords List – AntConc

The keywords list in AntConc is, as the name suggests, a tool to create a list of keywords. To do this your target corpus is compared to a reference corpus. The target and reference corpora do not need to be of the same size. The comparison is then done statistically. The statistics in AntConc used for this task are either chi-squared and log-likelihood.

In AntConc load your corpus or corpora. Go to Wordlist tab then click start.

make wordlist

Select the Tools Preference menu.

Continue reading

Statistical terms – measurement

Generally, there are four data types in statistics: nominal, ordinal, interval and ratio.

Nominal data as the name suggests is characterize data by name. For example, the categorization of someone as male or female is nominal data. There is no order or rank between nominal data or only difference.

Ordinal data is data which can be ordered. For example, student class levels are ordinal in the sense that second year students are above first years students, and third year students are above second year students. Thet may be logical in order but they do not in any way say anything about how good students are or how diferent they are. Third year students, say, are not twice as good as first year students because they are two levels higher than the latter.

Interval data has order and also discrete differences in their intervals. Temperature is an example of interval data. There difference of ten degrees between 20 and 30 is equal to the ten degree difference between 30 and 40. They are relative to each other.

Ratio data is has an order, discrete intervals and (in Sarah Boslaugh’s word) a “natural” zero. Unlike temperature in the previous example of interval data zero degrees does not end there. Temperatures can drop below zero (and they often do). Weight, height and money and good examples of ratio data in that you cannot be -10 pounds, -10cm or -$10 dollars (well you can be in debt but you can’t show me -$10). You can say your friend has twice the money as yourself.

Just remember, measurement types make most sense when contrasted against each other and not talked about in isolation. When in doubt, try to fit them into the definitions aboves to see which one they match.