2075: Wrong URL encoding when using getURLContentAsync

Help improve MediaMonkey 5 by testing the latest pre-release builds, and reporting bugs and feature requests.

Moderator: Gurus

TIV73
Posts: 229
Joined: Sat Nov 12, 2011 1:31 pm

2075: Wrong URL encoding when using getURLContentAsync

Post by TIV73 »

Hey there,
I believe I found a (possible) issue when using app.utils.web.getURLContentAsync. I'm currently working on an plugin for the new autotag framework for vgmdb.net and noticed that lookups with
getURLContentAsync consistently fail if the URL contains certain japanese characters, e.g. 瀬 or 々. If I now run the following code:

Code: Select all

var headers = newStringList()
headers.add('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')
var requestURL = 'http://vgmdb.info/search/albums?format=json&q=瀬'
app.utils.web.getURLContentAsync(requestURL, {headers: headers}).then(function(Content) {console.log(Content)})
An object containing no results is returned, even though there should be close to 140 albums containing this character. To verify, just open the same URL in a browser, and the correct results will be returned.

To rule out that the difference in results is not just the browser just handling the string differently or doing some other magic, I opened the powershell ISE and ran the following statement

Code: Select all

$response=Invoke-WebRequest -Uri "http://vgmdb.info/search/albums?format=json&q=瀬";$response.Content
Again, the correct results were returned. To narrow down the issue, I had a closer look at the returned object, and noticed that it contains the performed query as link parameter. When performing the query with ps or in the browser, the link property contained "search/albums/%E7%80%AC", while the same query in mediamonkey returned "search/albums/%E7%C2%3F%AC", so for some reason the character was encoded incorrectly.

Originally I thought that the server is just misinterpreting the provided URL for some reason or returning because I didn't provide any details about language, charset, etc., so I ran both functions in powershell and MM again and captured both sessions. Turns out, that the request sent to the server by MM already contained the wrong encoding:

Code: Select all

ps => GET /search/albums?format=json&q=%E7%80%AC
MM => GET /search/albums?format=json&q=%E7%C2?%AC
Now, this actually has a quite straightforward solution. Simply manually encoding the URL and then providing the pre-encoded string to getURLContentAsync yields the correct result:

Code: Select all

var requestURL = encodeURI('http://vgmdb.info/search/albums?format=json&q=瀬')
I can't really claim to understand why it works, because it looks like getURLContentAsync internally already calls encodeURI on the provided URL before passing it to _loadDataFromServer, but for some reason it does. Anyway, it's probably a not the biggest problem in practical terms since the wrong encoding can be easily intercepted before calling getURLContentAsync, but I still wanted to provide some feedback about it as it could possibly create issues down the road.
PetrCBR
Posts: 1763
Joined: Tue Mar 07, 2006 5:31 pm
Location: Czech
Contact:

Re: 2075: Wrong URL encoding when using getURLContentAsync

Post by PetrCBR »

Hi. We're using Indy library and it requires encoded URL so encodeURI is required when you use any special or unicode character in your URL.
How to make a debuglog - step 4b: viewtopic.php?f=30&t=86643
TIV73
Posts: 229
Joined: Sat Nov 12, 2011 1:31 pm

Re: 2075: Wrong URL encoding when using getURLContentAsync

Post by TIV73 »

Alright, so using encodeURI before passing an URL to getURLContentAsync is not just a workaround but the accepted solution. Thanks for letting me know!
Post Reply