| 00:01:32 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 00:04:33 | | Lord_Nightmare (Lord_Nightmare) joins |
| 00:21:50 | | JackThompson quits [Quit: Ping timeout (120 seconds)] |
| 00:22:08 | | JackThompson joins |
| 00:24:17 | | qwertyasdfuiopghjkl joins |
| 00:29:55 | | lennier2 joins |
| 00:30:03 | | lennier1 quits [Ping timeout: 265 seconds] |
| 00:30:05 | | lennier2 is now known as lennier1 |
| 01:02:21 | | ave quits [Client Quit] |
| 01:02:22 | | ave1 (ave) joins |
| 01:02:22 | | ave1 is now known as ave |
| 01:12:26 | | nuc is now known as Somebody2 |
| 01:13:48 | | Somebody2 is now authenticated as Somebody2 |
| 01:20:02 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 01:23:12 | | dm4v quits [Client Quit] |
| 01:23:18 | | qwertyasdfuiopghjkl joins |
| 01:23:36 | | dm4v joins |
| 01:27:53 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 01:32:38 | | qwertyasdfuiopghjkl joins |
| 01:32:41 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 01:36:50 | | dm4v quits [Ping timeout: 296 seconds] |
| 01:37:47 | | qwertyasdfuiopghjkl joins |
| 01:40:55 | | dm4v joins |
| 03:24:43 | | sepro quits [Quit: Bye!] |
| 03:25:05 | | sepro (sepro) joins |
| 04:00:00 | | treora quits [Client Quit] |
| 04:01:13 | | treora joins |
| 04:43:19 | | sonick (sonick) joins |
| 04:49:07 | | dm4v quits [Ping timeout: 265 seconds] |
| 05:24:41 | | march_happy quits [Ping timeout: 268 seconds] |
| 05:25:41 | | march_happy (march_happy) joins |
| 05:31:22 | | dm4v joins |
| 05:44:53 | <@OrIdow6> | JAA: Oh |
| 05:45:26 | <@OrIdow6> | If we do ever solve the TLS thing I wonder if we should keep it a secret |
| 06:10:04 | <monika> | TLS fingerprint spoofing is an open secret at this point |
| 06:10:13 | <monika> | most of the scraping community knows about it |
| 06:13:04 | <monika> | the biggest roadblock would be JS-based client fingerprinting |
| 06:14:10 | <monika> | for some CF properties they do inject a "bot management" script that sliently fingerprints your client |
| 06:14:17 | <monika> | discord is a prime example |
| 06:18:56 | | sec^nd quits [Remote host closed the connection] |
| 06:19:28 | | sec^nd (second) joins |
| 06:25:13 | | BlueMaxima_ quits [Client Quit] |
| 06:28:37 | <@OrIdow6> | Why does Cloudflare do it then? |
| 06:28:53 | <@OrIdow6> | I assume "the scraping community" is responsible for a lot of this stuff they're trying to block |
| 06:36:12 | | akaibu joins |
| 06:52:20 | <monika> | well, their business is to "protect" websites from bots and other miscreants |
| 06:52:32 | <monika> | to cloudflare, web archiving is non-human traffic so it's fair game to them |
| 06:52:45 | <monika> | you gotta improvise, adapt, and overcome :D |
| 06:55:49 | <monika> | also, you're quite correct about that assumption |
| 06:56:27 | <monika> | i lurk in the puppeteer-extra discord and tons of people talk about botting ecommerce and sneaker websites |
| 06:57:37 | <monika> | presumably to scalp whatever hot gadget is coming out |
| 06:57:40 | <monika> | quite sad |
| 06:58:27 | | Fiendster joins |
| 07:01:11 | <monika> | here's the current state of JS browser fingerprinting: https://github.com/niespodd/browser-fingerprinting |
| 07:01:19 | <monika> | quite a fascinating read |
| 07:01:33 | | Fiendster quits [Remote host closed the connection] |
| 07:59:28 | | Sluggs quits [Ping timeout: 268 seconds] |
| 08:00:17 | | Sluggs joins |
| 09:07:54 | | mrfooooo quits [Quit: The Lounge - https://thelounge.chat] |
| 09:09:10 | | mrfooooo joins |
| 09:34:38 | | tech_exorcist (tech_exorcist) joins |
| 09:38:26 | | systwi__ (systwi) joins |
| 09:40:04 | | systwi quits [Ping timeout: 240 seconds] |
| 09:40:48 | | Atom-- quits [Ping timeout: 255 seconds] |
| 09:43:23 | | Atom joins |
| 09:54:59 | | akaibu quits [Client Quit] |
| 12:52:56 | | fishingforpie joins |
| 12:53:06 | <fishingforpie> | Hello! |
| 12:53:19 | <fishingforpie> | Is there any way to archive the entirety of a site? |
| 12:53:31 | <fishingforpie> | The site in question is https://gbuc.net. |
| 12:53:52 | <fishingforpie> | Err, http://gbuc.net |
| 12:54:32 | <thuban> | fishingforpie: generally, yes; we use archivebot for that and it looks like it would work well on that site. |
| 12:54:40 | <thuban> | do you have some reason to think it's endangered? |
| 12:55:04 | <fishingforpie> | Well, it's very old. So old in fact that the old player is HTML5 and the new player is Flash. |
| 12:55:21 | <fishingforpie> | Many of the songs and artists on there, are only on there. |
| 12:56:26 | <fishingforpie> | It's not particularly endangered, but it has seen better times. |
| 12:56:45 | <theblazehen|m> | Yeah it sure does look ancient... |
| 12:56:57 | <fishingforpie> | It's moreso outdated than endangered. It straight up collapses when you use it with HTTPS. |
| 12:57:06 | <thuban> | oh hm, let me see if archivebot would actually be able to get the media |
| 12:57:17 | <fishingforpie> | It's all hosted on the site. |
| 13:00:54 | <fishingforpie> | It also appears that some of the artwork from some older pages got wiped a year and a half ago. |
| 13:01:00 | <fishingforpie> | http://gbuc.net/modules/bluesbb/thread.php?top=8&thr=501&sty=1&num=l50#p3314 |
| 13:02:19 | <fishingforpie> | Don't worry, just the artwork, the audio files (which sometimes were tagged with the artwork) was fine. |
| 13:02:44 | <thuban> | the html player is a javascript 'link', not a real one, so archivebot wouldn't be able to get it, but if we spidered the site with archivebot we could derive player pages from the results, and feeding that list back into ab would get the audio |
| 13:03:59 | <thuban> | fishingforpie: can you describe the structure of the site? the urls refer to 'myalbum', but it looks like they're actually all individual tracks; is that right? |
| 13:04:06 | <fishingforpie> | I see. |
| 13:04:26 | <fishingforpie> | Yes, it's displayed in individual tracks I believe. |
| 13:04:53 | <fishingforpie> | By album, they may imply a Single Album. |
| 13:05:45 | <thuban> | hm? i don't think i understand |
| 13:06:12 | <fishingforpie> | Single Albums are those releases with one or two tracks. |
| 13:08:00 | <fishingforpie> | Album art is typically stored at http://gbuc.net/modules/myalbum/photos/number.extension (Usually PNG, I think I have seen it as JPG before). |
| 13:10:12 | <fishingforpie> | I can't describe it well. |
| 13:10:31 | <thuban> | ok, i think i got it |
| 13:10:50 | <thuban> | photos are embedded directly in the track info pages and should not require any special processing |
| 13:12:00 | <fishingforpie> | Ah. |
| 13:13:13 | <fishingforpie> | The site also appears to have blogs (rebranded forums it appears) and gbUctube, where there are music videos available. |
| 13:13:45 | <fishingforpie> | gbUctube's player no longer works as far as I am aware however downloading the files works just fine. |
| 13:14:53 | <fishingforpie> | Probably because the player uses Flash, yikes. |
| 13:15:45 | <thuban> | there's an actual forum, too |
| 13:16:32 | <fishingforpie> | Oh right. |
| 13:18:24 | <thuban> | unlike the track pages, the video pages actually have a direct download link, so that also won't require special processing |
| 13:18:50 | <fishingforpie> | Yep. |
| 13:26:39 | <fishingforpie> | Hopefully it can somewhat at least be backed up! :) |
| 13:34:41 | <thuban> | fishingforpie: i've left a summary in #archivebot; someone will pop it in the queue when appropriate |
| 13:34:51 | <fishingforpie> | #archivebot |
| 13:35:40 | | Arcorann quits [Ping timeout: 240 seconds] |
| 13:36:24 | <fishingforpie> | When appropriate? |
| 13:37:23 | <thuban> | ab had some downtime recently; i think everything's now running again, but we may be limited in capacity / working through a more urgent backlog |
| 13:38:05 | <fishingforpie> | Ah. |
| 14:10:46 | <@Sanqui> | put into archivebot |
| 14:10:57 | <@Sanqui> | I probably won't have time to do the derivation though, but if somebody provides a list of URLs, we can do it |
| 14:13:13 | | janh joins |
| 14:14:24 | <janh> | oh no, webry blog is shutting down on 2022-12-01 https://support.at.webry.info/202209/article_faq_054.html |
| 14:14:43 | <@Sanqui> | janh: please put it on the deathwatch on the wiki |
| 14:24:56 | <fishingforpie> | Also GBUC is primarily a music site by the way. |
| 14:26:34 | <thuban> | Sanqui: thanks! i'll do the derivation when the job finishes (somebody ping me if i forget) |
| 14:27:05 | <@Sanqui> | awesome! |
| 14:27:29 | <janh> | oops, i searched wrong. it was already on there. sorry =_= |
| 14:27:32 | <@Sanqui> | it's going quickly. |
| 14:27:46 | <@Sanqui> | janh: thanks anyway! |
| 14:37:30 | <TheTechRobo> | Question: Why is archivete.am not a redirect to wiki.archiveteam.org ? |
| 14:41:27 | | mrfooooo6 joins |
| 14:41:27 | | dm4v quits [Client Quit] |
| 14:41:27 | | mrfooooo quits [Client Quit] |
| 14:41:27 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 14:41:27 | | mrfooooo6 is now known as mrfooooo |
| 14:41:31 | | dm4v joins |
| 14:41:39 | | qwertyasdfuiopghjkl joins |
| 14:46:58 | | AK quits [Remote host closed the connection] |
| 14:47:50 | | AK (AK) joins |
| 15:55:58 | | march_happy quits [Read error: Connection reset by peer] |
| 15:57:02 | | march_happy (march_happy) joins |
| 16:48:42 | | systwi__ is now known as systwi |
| 16:59:29 | | sonick quits [Client Quit] |
| 17:06:37 | | tech_exorcist quits [Remote host closed the connection] |
| 17:06:57 | | tech_exorcist (tech_exorcist) joins |
| 17:12:07 | | tech_exorcist quits [Remote host closed the connection] |
| 17:12:52 | | tech_exorcist (tech_exorcist) joins |
| 17:17:47 | <@JAA> | Sanqui needs to be burnt at stake for using a URL shortener. |
| 17:17:54 | <@JAA> | :-) |
| 17:19:24 | | march_happy quits [Ping timeout: 268 seconds] |
| 17:20:28 | <@JAA> | And it's under 25% shorter than the shortest proper link to the page. lol |
| 17:20:41 | <@Sanqui|m> | I would deserve it. But if there's a URL shortener with some evidenced longevity, it's the original one |
| 18:36:34 | | tech_exorcist quits [Remote host closed the connection] |
| 18:36:58 | | tech_exorcist (tech_exorcist) joins |
| 19:01:59 | | dm4v quits [Client Quit] |
| 19:01:59 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 19:02:05 | | dm4v joins |
| 19:02:27 | | qwertyasdfuiopghjkl joins |
| 19:53:11 | | tech_exorcist quits [Remote host closed the connection] |
| 19:53:41 | | tech_exorcist (tech_exorcist) joins |
| 20:23:24 | | mut4ntmonkey quits [Ping timeout: 255 seconds] |
| 20:32:47 | | Hackerpcs quits [Quit: Hackerpcs] |
| 20:34:55 | | Hackerpcs (Hackerpcs) joins |
| 20:36:03 | | mut4ntmonkey (mutantmonkey) joins |
| 21:00:23 | | user_ quits [Remote host closed the connection] |
| 21:00:34 | | gazorpazorp (gazorpazorp) joins |
| 21:02:03 | | systwi_ quits [Quit: systwi_] |
| 21:02:48 | | systwi_ joins |
| 21:50:17 | | march_happy (march_happy) joins |
| 22:00:08 | <h2ibot> | JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49130&oldid=49121 |
| 22:05:16 | | tzt quits [Ping timeout: 240 seconds] |
| 22:14:36 | | katocala quits [Remote host closed the connection] |
| 22:26:40 | | BlueMaxima joins |
| 22:45:55 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 22:49:16 | | Overlordz joins |
| 22:58:43 | | second (second) joins |
| 23:01:17 | | sec^nd quits [Remote host closed the connection] |
| 23:01:17 | | second is now known as sec^nd |
| 23:02:28 | | Arcorann (Arcorann) joins |
| 23:18:46 | | BlueMaxima joins |
| 23:22:04 | | apache2_ quits [Ping timeout: 240 seconds] |
| 23:53:26 | <h2ibot> | Pokechu22 edited ArchiveBot (-383, remove broken ninjawedding.org URLs in favor of…): https://wiki.archiveteam.org/?diff=49131&oldid=48627 |
| 23:57:45 | | dm4v_ joins |
| 23:58:00 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 23:58:26 | | Lord_Nightmare2 (Lord_Nightmare) joins |
| 23:58:26 | | systwi_ quits [Client Quit] |
| 23:58:26 | | Lord_Nightmare quits [Client Quit] |
| 23:58:27 | | dm4v quits [Client Quit] |
| 23:58:27 | | dm4v_ is now known as dm4v |
| 23:58:27 | | Lord_Nightmare2 is now known as Lord_Nightmare |
| 23:59:28 | | systwi_ joins |