New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

Case-sensitive and multiple language search in Wikipedia

General discussion

Case-sensitive and multiple language search in Wikipedia

Postby thorsten » Fri Feb 15, 2013 11:33 am

Hi,

I'm a big fan of GoldenDict but there are two things which seriously limit the usefulness of GoldenDict:

1. Searches in Wikipedia dictionaries seem to be case-sensitive. For example, the search for "abaco islands" in the Wikipedia group returns "No translation for abaco islands was found in group Wikipedia."

This has already been a topic in 2011: "That is what Wikipedia returns. We have no control on how it performs the search. If it can't search case-insensitively, then so can't we."[1].

Nevertheless it's obvious that Wikipedia does NOT search case-sensitively. If I type "w abaco islands" in the address bar of my browser (Opera) and press enter, I get to the article for Abaco Islands. The "w" search is linked to "http://en.wikipedia.org/wiki/Special:Search?search=%s".

The search in the Wikipedia website itself is also not case-sensitive.

2. If I search for a certain word in the Group Wikipedia, it is only found in multiple languages, when the title of the article is *exactly* the same in the other languages. Example: "Icelandic alphabet" is only found in the English Wikipedia and not for instance in the French Wikipedia. The article on the Wikipedia website[2] shows in the language category on the left that there is a French version of the article (called "Alphabet islandais").

The effect of these two issues is that I always have to check the GoldenDict search results on the Wikipedia website if they are correct. The question is then why not use Wikipedia directly in a browser and dump GoldenDict for Wikipedia searches?!

Thorsten
[1] viewtopic.php?f=4&t=1202&p=5213&hilit=case+sensitive#p5213
[2] http://en.wikipedia.org/wiki/Icelandic_alphabet
thorsten
 
Posts: 1
Joined: Fri Feb 15, 2013 11:00 am

Re: Case-sensitive and multiple language search in Wikipedia

Postby chulai » Fri Feb 15, 2013 5:20 pm

I agree with you. Those are 2 interesting requests. And for the same reason I sometimes use Wikipedia in the web browser.

about 2) The built-in wiki support does not have this feature. But you can always add wikipedia as a online dictionary Edit > Dictionaries > Sources > Websites.
BTW, this feature was requested before: viewtopic.php?f=4&t=999

Feel free to create a ticket at https://github.com/goldendict/goldendict/issues
chulai
 
Posts: 464
Joined: Sat Jan 08, 2011 10:11 pm

Re: Case-sensitive and multiple language search in Wikipedia

Postby Abs62 » Fri Feb 15, 2013 5:58 pm

thorsten
1. Searches in Wikipedia dictionaries seem to be case-sensitive. For example, the search for "abaco islands" in the Wikipedia group returns "No translation for abaco islands was found in group Wikipedia."

This is MediaWiki API limitation - case sensitive search (except first letter).
Nevertheless it's obvious that Wikipedia does NOT search case-sensitively. If I type "w abaco islands" in the address bar of my browser (Opera) and press enter, I get to the article for Abaco Islands. The "w" search is linked to "http://en.wikipedia.org/wiki/Special:Search?search=%s".

This is other API. You can use appropriate link via Websites tab instead of Wikipedia tab, and you'll find article through "abaco islands". But in this mode you can't see Wikipedia items in word list.
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: Case-sensitive and multiple language search in Wikipedia

Postby chulai » Fri Feb 15, 2013 7:14 pm

Abs62 wrote:thorsten
1. Searches in Wikipedia dictionaries seem to be case-sensitive. For example, the search for "abaco islands" in the Wikipedia group returns "No translation for abaco islands was found in group Wikipedia."

This is MediaWiki API limitation - case sensitive search (except first letter).
Nevertheless it's obvious that Wikipedia does NOT search case-sensitively. If I type "w abaco islands" in the address bar of my browser (Opera) and press enter, I get to the article for Abaco Islands. The "w" search is linked to "http://en.wikipedia.org/wiki/Special:Search?search=%s".

This is other API. You can use appropriate link via Websites tab instead of Wikipedia tab, and you'll find article through "abaco islands". But in this mode you can't see Wikipedia items in word list.


I did a quick look to the mediaWiki API at http://en.wikipedia.org/w/api.php

It seems GoldenDict is using http://en.wikipedia.org/w/api.php?actio ... %20Islands
but I'm also found the OpenSearch action that allow case-insensitive searches: http://en.wikipedia.org/w/api.php?actio ... bAcO%20isL

Wouldn't be possible to use that alternative in GoldenDict so to implement a case-insensitive search?
chulai
 
Posts: 464
Joined: Sat Jan 08, 2011 10:11 pm

Re: Case-sensitive and multiple language search in Wikipedia

Postby Abs62 » Sat Feb 16, 2013 8:59 am

The "opensearch" API give some strange results. For example on query "action=opensearch&format=jsonfm&limit=40&search=ab" it show results:
[
"ab",
[
"Abraham Lincoln",
"ABC News",
"Aberdeen",
"About.com",
"Abortion",
"Abolitionism",
"Abu Dhabi",
"Abbasid Caliphate",
"Aberdeen F.C.",
"Above mean sea level",
"ABS-CBN",
"Abbot",
"Aberdeenshire",
"ABBA",
"AbsolutePunk"
]
]

Try to compare this result with "action=query" (used now in GD) results. It even don't find the existing page "Ab".
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: Case-sensitive and multiple language search in Wikipedia

Postby chulai » Sun Feb 17, 2013 10:51 pm

Abs62 wrote:The "opensearch" API give some strange results. For example on query "action=opensearch&format=jsonfm&limit=40&search=ab" it show results:
[
"ab",
[
"Abraham Lincoln",
"ABC News",
"Aberdeen",
"About.com",
"Abortion",
"Abolitionism",
"Abu Dhabi",
"Abbasid Caliphate",
"Aberdeen F.C.",
"Above mean sea level",
"ABS-CBN",
"Abbot",
"Aberdeenshire",
"ABBA",
"AbsolutePunk"
]
]

Try to compare this result with "action=query" (used now in GD) results. It even don't find the existing page "Ab".


I don't think these results are strange; action=opensearch is EXACTLY how Wikipedia search works when it gives the user suggested articles:

open search in Wikipedia.PNG
open search in Wikipedia.PNG (74.73 KiB) Viewed 13619 times


Compare results with equivalent API call: http://en.wikipedia.org/w/api.php?actio ... &search=ab

Now, if user clicks in "containing... ab" in the above picture, Wikipedia perform a full-text search:

full-text search in Wikipedia.PNG
full-text search in Wikipedia.PNG (56.04 KiB) Viewed 13619 times


Compare results with equivalent API call: http://en.wikipedia.org/w/api.php?actio ... rsearch=ab

If the user instead choose to press Enter or click in the Search button, Wikipedia loads the article for AB: http://en.wikipedia.org/wiki/AB

wikipedia article page if user presses search without opting for one of the suggested articles.PNG
wikipedia article page if user presses search without opting for one of the suggested articles.PNG (48.24 KiB) Viewed 13619 times


My guess is that the equivalent API call is: http://en.wikipedia.org/w/api.php?actio ... text|revid

So I think we can combine the results from both, action=opensearch and action=queryl&list=allpages, or just use action=opensearch for the word lists in GoldenDict and if the user press enter or select one of the suggestion then GD calls action=parse&page={WORD_ENTERED_OR_PICKED_FROM_LIST}

In regards to request #2 from @thorsten we can also implement the multilanguage navigation with the wikiMedia API. For example to show the Interlanguage links from the article "AB":

Code: Select all
<h3>Languages</h3>
   <div class="body">
      <ul>
         <li class="interwiki-az"><a href="//az.wikipedia.org/wiki/AB_(d%C9%99qiql%C9%99%C5%9Fdirm%C9%99)" title="AB (dəqiqləşdirmə)" lang="az" hreflang="az">Azərbaycanca</a></li>
         <li class="interwiki-br"><a href="//br.wikipedia.org/wiki/Ab" title="Ab" lang="br" hreflang="br">Brezhoneg</a></li>
         <li class="interwiki-cs"><a href="//cs.wikipedia.org/wiki/AB" title="AB" lang="cs" hreflang="cs">Česky</a></li>
         <li class="interwiki-da"><a href="//da.wikipedia.org/wiki/AB" title="AB" lang="da" hreflang="da">Dansk</a></li>
         <li class="interwiki-de"><a href="//de.wikipedia.org/wiki/AB" title="AB" lang="de" hreflang="de">Deutsch</a></li>
         <li class="interwiki-el"><a href="//el.wikipedia.org/wiki/AB" title="AB" lang="el" hreflang="el">Ελληνικά</a></li>
         <li class="interwiki-es"><a href="//es.wikipedia.org/wiki/AB" title="AB" lang="es" hreflang="es">Español</a></li>
         <li class="interwiki-eo"><a href="//eo.wikipedia.org/wiki/Ab" title="Ab" lang="eo" hreflang="eo">Esperanto</a></li>
         <li class="interwiki-fa"><a href="//fa.wikipedia.org/wiki/AB" title="AB" lang="fa" hreflang="fa">فارسی</a></li>
         <li class="interwiki-fr"><a href="//fr.wikipedia.org/wiki/AB" title="AB" lang="fr" hreflang="fr">Français</a></li>
         <li class="interwiki-ko"><a href="//ko.wikipedia.org/wiki/AB" title="AB" lang="ko" hreflang="ko">한국어</a></li>
         <li class="interwiki-id"><a href="//id.wikipedia.org/wiki/AB" title="AB" lang="id" hreflang="id">Bahasa Indonesia</a></li>
         <li class="interwiki-it"><a href="//it.wikipedia.org/wiki/AB" title="AB" lang="it" hreflang="it">Italiano</a></li>
         <li class="interwiki-sw"><a href="//sw.wikipedia.org/wiki/AB" title="AB" lang="sw" hreflang="sw">Kiswahili</a></li>
         <li class="interwiki-lt"><a href="//lt.wikipedia.org/wiki/AB_(reik%C5%A1m%C4%97s)" title="AB (reikšmės)" lang="lt" hreflang="lt">Lietuvių</a></li>
         <li class="interwiki-nl"><a href="//nl.wikipedia.org/wiki/AB" title="AB" lang="nl" hreflang="nl">Nederlands</a></li>
         <li class="interwiki-ja"><a href="//ja.wikipedia.org/wiki/AB" title="AB" lang="ja" hreflang="ja">日本語</a></li>
         <li class="interwiki-no"><a href="//no.wikipedia.org/wiki/AB" title="AB" lang="no" hreflang="no">Norsk (bokmål)‎</a></li>
         <li class="interwiki-nn"><a href="//nn.wikipedia.org/wiki/Ab" title="Ab" lang="nn" hreflang="nn">Norsk (nynorsk)‎</a></li>
         <li class="interwiki-pl"><a href="//pl.wikipedia.org/wiki/AB" title="AB" lang="pl" hreflang="pl">Polski</a></li>
         <li class="interwiki-pt"><a href="//pt.wikipedia.org/wiki/AB" title="AB" lang="pt" hreflang="pt">Português</a></li>
         <li class="interwiki-ro"><a href="//ro.wikipedia.org/wiki/AB" title="AB" lang="ro" hreflang="ro">Română</a></li>
         <li class="interwiki-ru"><a href="//ru.wikipedia.org/wiki/AB" title="AB" lang="ru" hreflang="ru">Русский</a></li>
         <li class="interwiki-sk"><a href="//sk.wikipedia.org/wiki/Ab" title="Ab" lang="sk" hreflang="sk">Slovenčina</a></li>
         <li class="interwiki-sl"><a href="//sl.wikipedia.org/wiki/AB" title="AB" lang="sl" hreflang="sl">Slovenščina</a></li>
         <li class="interwiki-fi"><a href="//fi.wikipedia.org/wiki/Ab" title="Ab" lang="fi" hreflang="fi">Suomi</a></li>
         <li class="interwiki-sv"><a href="//sv.wikipedia.org/wiki/AB" title="AB" lang="sv" hreflang="sv">Svenska</a></li>
         <li class="interwiki-tr"><a href="//tr.wikipedia.org/wiki/AB" title="AB" lang="tr" hreflang="tr">Türkçe</a></li>
         <li class="interwiki-zh"><a href="//zh.wikipedia.org/wiki/AB" title="AB" lang="zh" hreflang="zh">中文</a></li>
         <li class="wbc-editpage"><a href="//www.wikidata.org/wiki/Special:ItemByTitle/enwiki/AB" title="Edit interlanguage links">Edit links</a></li>
      </ul>
   </div>


The equivalent API call would be: http://en.wikipedia.org/w/api.php?actio ... &titles=AB

Finally, there is a sandbox page to test the MediaWiki API. It's very easy to use and more or less self-documented.
And the MediaWiki API documentation is very useful too.

I think both requests are worth to be implemented to improve wiki supports.

Regards,
Chulai
chulai
 
Posts: 464
Joined: Sat Jan 08, 2011 10:11 pm


Return to General

Who is online

Users browsing this forum: No registered users and 11 guests