Quantcast

Jump to content


Photo

Solving Mystery Pic in <1 Second

fun database yay

  • Please log in to reply
28 replies to this topic

#1 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 14 July 2015 - 02:59 PM

*
POPULAR POST!

Inspired by this topic, I decided to write a program that solves the Mystery Pic using database queries. It was pretty fun, and the end result was a program that is capable of solving the MPIC in <1 second.
Please excuse any gross looking code that you may encounter in this topic, these are scripts that I only planned on running a few times, and I didn't take much care when crafting them.

Tools

  • Python (Pillow, pymongo)
  • MongoDB
  • Time

Step 1: The Images

 

Well, I could hardly download every neopets image every time I wanted to solve an mpic, so I started with that. There were about 130,000 in the end, I'm sure that I'm still missing many.
I used a combination of Dr. Sloth and the JN item DB to obtain these images.

This script scrapes every image url from Dr. Sloth:

Spoiler

 

I coupled this with a few lines on the command line to get a list of all item images and misc images. The JN item DB will happily display all items at once, making the item URLs very easy to scrape.

I then used this script to download all the images, this process wasn't as lengthy as you might think:

Spoiler

After all was said and done, I had a directory with 130,000 images in it, this was the longest and most annoying part by far.

Step 2: The Documents
I needed a way to quickly search the database for images, so for each image, I generated a BSON document that contained a list of all the colors contained in the image. The plan was to match images that contained the same colors as the MPIC.

Here is the script that I used to generate the docs from images, and upload the generated docs to MongoDB:

Spoiler

These generated documents take up about a gig across 3 files, which is pretty manageable compared to the 130,000 images from before.

Step 3: The Profit

The final script I had to write is short and sweet. Download the MPIC hint from the supplied URL, get a list of the colors that it uses, and query the database for any images that contain the same colors:

Spoiler

 

Which results in a short and sweet final output:

{u'_id': ObjectId('55a58719292f980c548598d4'), u'name': u'plush_kyrii_mutant.gif'}
Found in 0.832999944687 seconds


#2 Swar

Swar
  • retired cheater

  • 9280 posts


Users Awards

Posted 14 July 2015 - 03:05 PM

Well this is wonderful :D now the fight will be over who can access the mystery pic page first after a new round starts :p

#3 Strategist

Strategist
  • Sadmin

  • 10012 posts


Users Awards

Posted 14 July 2015 - 03:35 PM

Nice work mate. There goes Wasers claim to fame :p

#4 Michaelhex

Michaelhex
  • 1018 posts


Users Awards

Posted 14 July 2015 - 05:09 PM

+rep for this. Sooo :p *I'm gonna sound so noobish here* I need to install:

  • Python (Pillow, pymongo)
  • MongoDB
  • Time

first. Then run the script on any browser ? Is that how it goes?


Edited by michaelhex, 14 July 2015 - 05:10 PM.


#5 karlwithak

karlwithak
  • 47 posts

Posted 14 July 2015 - 05:13 PM

Nice work.  I'm a bit surprised how quickly your query was able to get the correct answer.



#6 Florg

Florg
  • 711 posts


Users Awards

Posted 14 July 2015 - 07:53 PM

Well, now all those who have no idea how to properly do this themselves can say goodbye to ever winning this competition again xD



#7 Waser Lave

Waser Lave

  • 25516 posts


Users Awards

Posted 14 July 2015 - 08:13 PM

That's pretty much how I did my first version. It's all well and good until they give you one to solve with very few colours or where the colours have changed slightly. ;)

#8 anewvision

anewvision
  • 1195 posts


Users Awards

Posted 14 July 2015 - 08:46 PM

Wow, amazing work! Now I'll (almost) always get the item!

 

Waser can still have my rep :rolleyes: even though he doesn't need it. In fact, I'll give him some right now. :lol2:



#9 Nano

Nano
  • a delicious kiwi

  • 325 posts


Users Awards

Posted 14 July 2015 - 10:42 PM

Is step one mean to stop and hang here:

 

19aaba4f031e14efac3978d9ec05409b.png



#10 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 14 July 2015 - 11:00 PM

This wasn't meant to be a full working product, you will still need programming knowledge to put all these pieces together and get a queryable DB.

#11 Nano

Nano
  • a delicious kiwi

  • 325 posts


Users Awards

Posted 14 July 2015 - 11:04 PM

I only want to download the images. :p

 

I have no interest in working out the MPic.



#12 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 14 July 2015 - 11:08 PM

Well then you may have to restart the process to get it to go all the way through. Why it would hang completely, I don't know. At least some of the threads should be free.

#13 Nano

Nano
  • a delicious kiwi

  • 325 posts


Users Awards

Posted 14 July 2015 - 11:09 PM

Restarted it a few times. And sniffed the network, it just stops requesting from drsloth.



#14 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 14 July 2015 - 11:30 PM

You can try ripping out the parsing method I used and write a regex for getting the URLs, because the current one is really bad and I could imagine it causing a problem like that. But I had the code lying around and was too lazy to write a regex.

#15 Kway

Kway
  • Proud to be a Brony

  • 1242 posts


Users Awards

Posted 15 July 2015 - 12:29 AM

You can try ripping out the parsing method I used and write a regex for getting the URLs, because the current one is really bad and I could imagine it causing a problem like that. But I had the code lying around and was too lazy to write a regex.

(zip(*re.compile("(?P['\"])(http://images\.neopets\.com/.+?)(?P=quote)").findall(pageSource)) or {1:[]})[1]


#16 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 15 July 2015 - 01:16 AM

(zip(*re.compile("(?P['\"])(http://images\.neopets\.com/.+?)(?P=quote)").findall(pageSource)) or {1:[]})[1]

Yes, I know that it's easy. I'm just saying that I was too lazy. :p

#17 MariahPapaya

MariahPapaya
  • 16 posts

Posted 18 January 2016 - 02:44 PM

How ingenious! this will make it 100000 times easier to get those awards : PPP



#18 ScarDefault

ScarDefault
  • 75 posts


Users Awards

Posted 19 January 2016 - 06:56 AM

Anyone know if there is some way to make this in .php?... I don't understand this language yet :S



#19 juvian

juvian
  • 123 posts


Users Awards

Posted 19 January 2016 - 02:20 PM

After an year I stumble upon this topic lol. Nicely done, good to see someone read my long post haha



#20 Nano

Nano
  • a delicious kiwi

  • 325 posts


Users Awards

Posted 19 January 2016 - 03:28 PM

Anyone know if there is some way to make this in .php?... I don't understand this language yet  :S

 

 

It's possible, not sure it would be worth it though. Unsure if PHP could match Pythons processing speed for it.



#21 greatpanda1k

greatpanda1k
  • 4 posts

Posted 19 February 2016 - 05:56 PM

i happened to stop at the same place. just need to add in a user agent. nothing to do with parsing

 

Is step one mean to stop and hang here:

 

19aaba4f031e14efac3978d9ec05409b.png



#22 Nonexistent

Nonexistent
  • 681 posts


Users Awards

Posted 29 May 2016 - 11:59 AM

@Neoquest

 

I'm don't know python so the code is very confusing to me, but for this part

return '#{:02X}{:02X}{:02X}'.format(r, g, b)

are you returning the hex representation of the color as a string?


Edited by Nonexistent, 29 May 2016 - 01:45 PM.


#23 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 29 May 2016 - 03:36 PM

@Neoquest

 

I'm don't know python so the code is very confusing to me, but for this part

return '#{:02X}{:02X}{:02X}'.format(r, g, b)

are you returning the hex representation of the color as a string?

 

Yep, I'm converting the rgb values to a hex string.



#24 Nonexistent

Nonexistent
  • 681 posts


Users Awards

Posted 29 May 2016 - 04:07 PM

Yep, I'm converting the rgb values to a hex string.

So in the bson document each color would be represented as seven bytes, plus the meta data added in to describe the serializations, and all that comes down to only 1G? Did you filter out all the images that had a great number of color variances, like irl photos, before you proccessed them?

 

Cause my color representations are 4 bytes each, the name of the image is represented by 4 bytes, and a delimiter is represented by 4 bytes, no other metadata of any kind, and my proccessed file still comes down to 2.2G  :/


Edited by Nonexistent, 29 May 2016 - 04:09 PM.


#25 Neoquest

Neoquest
  • 1760 posts


Users Awards

Posted 29 May 2016 - 07:33 PM

So in the bson document each color would be represented as seven bytes, plus the meta data added in to describe the serializations, and all that comes down to only 1G? Did you filter out all the images that had a great number of color variances, like irl photos, before you proccessed them?

 

Cause my color representations are 4 bytes each, the name of the image is represented by 4 bytes, and a delimiter is represented by 4 bytes, no other metadata of any kind, and my proccessed file still comes down to 2.2G  :/

 

You've gotta have something wonky going on, because just doing the math:

With 1000 colors per image (way way high, I assume it's going to be like 100 realistically)
And 130,000 images

(1000 * 4 + 4 + 4) * 130,000 = 521040000 bytes or 521 megabytes.

Even if I got something wrong with my interpretation of what you're saying, my overestimate of color count should make your final file size around that size or smaller.





Also tagged with one or more of these keywords: fun, database, yay

1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users