View Full Version : Thumbnail Scraper for Cartographer's Choice

05-30-2009, 11:58 PM
So... I feel very selfish, bringing this up. This is a fantastic site already. Has anyone ever discussed the idea of a thumbnail scraper for the Cartographer's Choice Forum?

Edit; To clarify, I'm not even sure what all is involved in such an endeavor. It's simply that the thumbnails in the finished maps forum is so cool, but there isn't a similar page on the works we have chosen as exceptional.

05-31-2009, 07:10 AM
Thats a good idea. I will see what I can do.

05-31-2009, 08:19 AM
Is this ok now ?

05-31-2009, 09:50 AM
Wow, that was a quick response. It's awesome, TY. :D

05-31-2009, 11:33 AM
A search through Google revealed to me that yes, there are indeed a whole variety of tools and implements that were invented and produced to scrape one's thumbnails, but little that helps me to understand what exactly a thumbnail scraper does. Can one of you be so kind as to explain the concept to the graphically-challenged?

05-31-2009, 11:40 AM
Look here (http://www.cartographersguild.com/showthread.php?t=5781) (new, as of today) and here (http://www.cartographersguild.com/showthread.php?t=4527) and the challenge forums.

05-31-2009, 12:53 PM
A scraper is a program that requests documents (like pages off the web) and scrapes off some information on them to build up another document or page. Usually this is for stuff like scraping the daily financial data to build up historical finance data.

We started doing monthly challenges before I arrived here but at the time there were about 4 or 5 entries per month. Then as time went by we started to get more people enter each month and it started getting hard to work out what the state was of all the entrants 'work in progress'. So I write a WIP scraper and published a page of entries. Then this was used to make up the finished maps thread and now there's a Cartographers Choice thumbnails page too.

I also make a members introductory post index, a keyword index, and the CWBP place index and these are also scraped from the guild threads.

I also do the members map and the Ansium overview image but these are generated with external data and not scraped.

06-01-2009, 09:02 AM
ummm...I think I may have broke it or something by moving it out of cartographers choice...I would REALLY like to implement some kind of built in thumbnailer similar to CGTalk. (mentioned in another thread)

I've had a pretty crappy weekend and this is probably going to be a pretty crappy week, so I may be scarce this week, but I'm definitely willing to work some kind of cool thumbnail function.

Redrobes, I'm tempted at this point to give you a cgi-bin ftp account you can host stuff for us on so we're not killing yer bandwidth. What would you require to get all of the awesome stuff you've built for us hosted here with access for you to continue to work on it as necessary?

06-01-2009, 01:49 PM
It shouldn't have broken it as it just points to a page that I host its not on relative address or anything. Still, ill look in a mo...

If I had ftp I could host the thumbnail images on the main server but all of them finished, choice and all the challenges are only 70Mb so from a bandwidth point of view its not a biggie.

With cgi ability I could have the script run on the server which would be good because it could possibly fetch the image and resize it to the thumb quicker and also maybe get the pages faster but again since its only done now and again and only fetches the image if its changed then thats not going to make a big difference.

The area that would provide a great ability would be to have the indexer and scrapers all running off of a cron job so we could have them all updated daily. Having them run on the server and save the attachments there would mean that theres no login or ftp of the results so it makes sense to do that on the server. The other option is to have a web page with buttons on it where we can stab one to get it to update the thumbs. Thats a bit easier than running the script and uploading.

The requirements of the indexer etc is that it has the perl modules for some of the things it uses. One in particular is the image magick one which is what I use to resample the images. I have the modules locally of course but I know my ISP for my site has it installed so its quite likely.

The members map could be sent to the guild web server directly since its just running a straight up CGI bin thing for the map. No problem there. I think there are some vBulletin members map plugins tho if we wanted to change to that. I kinda like the current one as its nice and simple. I would be cool to have the link to it as a members only link too. Theres nothing at mo to stop anyone adding themselves in.

From the point of view of the site I think it would look nicer to have the info hosted from the save server keeping the same domain name but there's some small uncertainties about how we can update them all more automatically than we do currently. You should have all these files and ability without reliance on me anyway.

EDIT -- and to actually answer the question... ho hum... yep, just ftp to the cgi area, some temp space which might be in the cgi area for files in progress and members locations etc and some directory to host the scraped info / thumb images.

06-01-2009, 02:50 PM
Ok, first of them comming through now...


Anyone have any problems with this ?

Steel General
06-01-2009, 02:55 PM
Nope, came up fairly quickly and without any problems...

06-01-2009, 03:20 PM
Lookin good! I shoulda done this a long time ago...

However...what do you think about utilities.cartographersguild.com instead of /utilities

Plus...if possible could you add my google analytics code to the pages?

06-01-2009, 03:28 PM
I have changed over the choice and finished and just working on the challengers pages. I have the Ansium tiles and the indexes uploaded but just gotta change the links to them. I have the members map in a cgi-bin but I think you might have to point something adminy at that to turn it into a real cgi-bin dir. Dunno I have always been given mine but then I am cheap and buy only basic package all the time ! :D

I'll look into the google code. Do you know if that changes everytime the page is loaded or is it a fixed bit of html to plug in. I.e. do I have to scrape it each time ? The thing is, every time somebody uses an index they probably go to a proper page through it anyway and I bet your getting more hits because the pages are easier to find now. I'll try but I have to re-implement the scraper scripts yet to write the pages with the new URLs.

Anyway - lots of challenge pages to fix up. Was there any hope of somebody generating the New Challenge Entry thread numbers for all the really old challenges too. We should go back as far as were able.

Edit -- oh yeah and to answer the question (I have a habit of missing that...). I dont think it makes a lot of difference with the domain or sub page thing. Most of the time these are all just links that are hidden behind text labels and thumb images anyway.

06-01-2009, 03:35 PM
The google code is fixed...just do a page source on any of these pages and scroll near the bottom and you'll see it, its commented. Just take that whole block and have it dumped into the html at the bottom and analytics will track hits on it. There shouldn't be anything dynamic about it.

I'll see what I can do about getting the utilities cgi-bin working. Even if I have to call the host.

06-01-2009, 03:40 PM
It appears to be ok actually...


Hmm this is great. If Isomage wants to get in on it then he could put up random mapping stuff like his dungeon and current Traveller things. I have a mini map maker ill bung up too which is ever so simple.

06-01-2009, 03:47 PM
Well awesome! You've got a blanket statement to setup whatever ya need there then. Just keep me posted if something hugely new goes in there ;)

Imagemagick6 is installed...it should be in the "normal" place...if you don't know where that is, I'll pm it to ya.

Also, later once you get everything up and running right, I'll start the cron thing...and also I should probably put these cool utils on the main menu of the site.

06-01-2009, 03:53 PM
Also...I was looking at permissions...I think the locations file may need write permissions, but I didn't test it...Anyways, it looks like you got it rockin! Pretty sweet! I'm off for the day...I may check in a little later.

06-01-2009, 03:53 PM
Wow that would be well cool then. It would be a bit of a burden off my shoulders to not have to keep on top of the thumbs all the time. We could probably do more of them if its was less of a manual effort.

Don't suppose you can host the viewingdale webserver of Ansium then... gotta do that somehow. I'll keep thinking about that one. Actually, while I am here. I didnt get a lot of responses to my RPG in the CWBP post. What do you think. Good idea or might pull in lots of non mappers ? Sorta could do with a spin off domain like the Fantasium Alliance for running the RPG bit. Would make a kind of triangle wouldn't it. Mappers, writers and players.

06-01-2009, 03:55 PM
Also...I was looking at permissions...I think the locations file may need write permissions, but I didn't test it...Anyways, it looks like you got it rockin! Pretty sweet! I'm off for the day...I may check in a little later.

I have write pems set but I also didnt test it. Ok... where shall I nuke ? How about N Korea since they seem to be asking for it...

06-01-2009, 04:02 PM
BOOOOM - heh heh yeah it seems to be just fine !

Steel General
06-01-2009, 04:08 PM
BOOOOM - heh heh yeah it seems to be just fine !

Savage... :D

06-01-2009, 04:33 PM
Grin !

I have put up my little mini map creator as a CGI script. All you have to do is write some #'s into a text file and it turns that into a neat boxed grid. Its for when your super lazy like me. Also you can use a screen grab as the input to RobA's auto texturing Gimp dungeon creator.

I am sure that isomage and others will do a much better job than this simple one.


06-01-2009, 04:47 PM
Sweet! So, all that remains is google analytics code, and cron stuff...which I'll be checking into probably late tonight, or tomorrow while I'm at work.

If you have any instructions for what I need to point the cron too and how often, send em over or post em here.

06-01-2009, 05:35 PM
Check the Miscellaneous menu up top :D Getting there.

06-01-2009, 05:39 PM
Cool, I can see that this will unburden my sig too...

06-01-2009, 09:10 PM
I have the WIP challenge scraper running on cgi. I never thought it would work like that but I managed to crowbar it a bit. Its odd to see it get the image so darn quickly and it resizes it pretty fast too. Dunno what kind of iron is running that server but its quite beefy. So its a button push now to update the thumbs which is even less hassle than before where there was at least the FTP afterward.
So if cron can be applied then its hands free until the start of the months challenge to get the new entries up and you can be darned sure ill cgi that bit too cos that's currently a pain in the butt as well.

06-01-2009, 09:11 PM
Check the Miscellaneous menu up top :D Getting there.Any chance of that being under the community menu - like at the bottom with the members bit.

06-05-2009, 09:21 PM
Updated the menus. Miscellaneous menu has the keyword index and the Introductions Index. Community menu now has Member Location map and link to CWBP. CWBP Place index should probably be put on the Wiki and I'll add a link to it on the CWBP forum page.

Just out of curiosity...how does the Keyword Index compare to http://www.cartographersguild.com/tags.php

06-06-2009, 08:06 AM
My keyword job just indexes against a list of keywords that I have chosen and displays all threads which have that keyword in the title. It also displays the thread starter too so that if one person keeps posting about something then you know who to ask about it. The trouble with the keyword scraper is that it has to pull in all of the thread titles so thats about 10-15 pages of thread lists and then it sets about munging them all. It generates the member intro, keyword, finished maps index, and choice maps index all from the same get tho. Then the finished and choice maps are fetched from the index to make up the thumbs page for each.