Wednesday, December 31, 2008

Language and the Global Internet

One of my favorite topics. An article in the NYT by Daniel Sorid, Writing the Web’s Future in Numerous Languages (note the URL, that's amusing), begins,

The next chapter of the World Wide Web will not be written in English alone. Asia already has twice as many Internet users as North America, and by 2012 it will have three times as many. Already, more than half of the search queries on Google come from outside the United States.

Ouch! Conflating language and geography! A lot of people outside of North America or the US use English. The UK, Australia, New Zealand, Singapore, South Africa. The article then discusses an Indian software entrepreneur.
Mr. Ram Prakash said Western technology companies have misunderstood the linguistic landscape of India, where English is spoken proficiently by only about a tenth of the population and even many college-educated Indians prefer the contours of their native tongues for everyday speech.

That 1/10th was probably a good initial target market. The Internet is not always about everyday speech. For forms like instant message, blog postings, and social sites, yes, but not most news or business sites. The article mostly dances around the problems with India having 22 languages (22 according the the article).
Even among the largely English-speaking base of around 50 million Web users in India today, nearly three-quarters prefer to read in a local language, according to a survey by JuxtConsult, an Indian market research company. Many cannot find the content they are seeking. “There is a huge shortage of local language content,” said Sanjay Tiwari, the chief executive of JuxtConsult.

50 million seems like a good base from which you can build out. If content is king, then the users are the king-makers. It's very easy to be in the business world and to forget that most of the content out there is actually created by users. Back in the pre-dot-com days and the days of BBSes, almost all content was user-generated. The point is that the web is an excellent platform for people to create content, especially in local languages. Create! Why aren't they, then?

The article continually glosses over the differences between geography and language. India is discussed as one market, but if we are to look at markets defined by language, then "India" is useless as a category. 22 languages, and (according to the article) 420 million Hindi speakers, there simply is no "Indian language".

If there are 50 million web users in India, and for simplicity let's say 25 languages, then on average that will be 2 million per language. But, AFAIK, Hindi is the main language in terms of numbers of speakers, so, taking the average is a useless approach. Wikipedia does have a page about the (official) languages in India and numbers of speakers. There is another Wikipedia page about languages in India, which says there are 29 languages with more than 1 million speakers and 122 with more than 10,000 speakers. The Internet may be cool for a variety of reasons, but it is not the end-all, be-all. Food, housing, jobs, and health care are a bit more important. Communication can help with these, but the Internet is not the only way to communicate, although it makes it easier.

And the article seemed so promising.