00:01:32Lord_Nightmare quits [Quit: ZNC - http://znc.in]
00:04:33Lord_Nightmare (Lord_Nightmare) joins
00:21:50JackThompson quits [Quit: Ping timeout (120 seconds)]
00:22:08JackThompson joins
00:24:17qwertyasdfuiopghjkl joins
00:29:55lennier2 joins
00:30:03lennier1 quits [Ping timeout: 265 seconds]
00:30:05lennier2 is now known as lennier1
01:02:21ave quits [Client Quit]
01:02:22ave1 (ave) joins
01:02:22ave1 is now known as ave
01:12:26nuc is now known as Somebody2
01:20:02qwertyasdfuiopghjkl quits [Client Quit]
01:23:12dm4v quits [Client Quit]
01:23:18qwertyasdfuiopghjkl joins
01:23:36dm4v joins
01:27:53qwertyasdfuiopghjkl quits [Client Quit]
01:32:38qwertyasdfuiopghjkl joins
01:32:41qwertyasdfuiopghjkl quits [Remote host closed the connection]
01:36:50dm4v quits [Ping timeout: 296 seconds]
01:37:47qwertyasdfuiopghjkl joins
01:40:55dm4v joins
03:24:43sepro quits [Quit: Bye!]
03:25:05sepro (sepro) joins
04:00:00treora quits [Client Quit]
04:01:13treora joins
04:43:19sonick (sonick) joins
04:49:07dm4v quits [Ping timeout: 265 seconds]
05:24:41march_happy quits [Ping timeout: 268 seconds]
05:25:41march_happy (march_happy) joins
05:31:22dm4v joins
05:44:53<@OrIdow6>JAA: Oh
05:45:26<@OrIdow6>If we do ever solve the TLS thing I wonder if we should keep it a secret
06:10:04<monika>TLS fingerprint spoofing is an open secret at this point
06:10:13<monika>most of the scraping community knows about it
06:13:04<monika>the biggest roadblock would be JS-based client fingerprinting
06:14:10<monika>for some CF properties they do inject a "bot management" script that sliently fingerprints your client
06:14:17<monika>discord is a prime example
06:18:56sec^nd quits [Remote host closed the connection]
06:19:28sec^nd (second) joins
06:25:13BlueMaxima_ quits [Client Quit]
06:28:37<@OrIdow6>Why does Cloudflare do it then?
06:28:53<@OrIdow6>I assume "the scraping community" is responsible for a lot of this stuff they're trying to block
06:36:12akaibu joins
06:52:20<monika>well, their business is to "protect" websites from bots and other miscreants
06:52:32<monika>to cloudflare, web archiving is non-human traffic so it's fair game to them
06:52:45<monika>you gotta improvise, adapt, and overcome :D
06:55:49<monika>also, you're quite correct about that assumption
06:56:27<monika>i lurk in the puppeteer-extra discord and tons of people talk about botting ecommerce and sneaker websites
06:57:37<monika>presumably to scalp whatever hot gadget is coming out
06:57:40<monika>quite sad
06:58:27Fiendster joins
07:01:11<monika>here's the current state of JS browser fingerprinting: https://github.com/niespodd/browser-fingerprinting
07:01:19<monika>quite a fascinating read
07:01:33Fiendster quits [Remote host closed the connection]
07:59:28Sluggs quits [Ping timeout: 268 seconds]
08:00:17Sluggs joins
09:07:54mrfooooo quits [Quit: The Lounge - https://thelounge.chat]
09:09:10mrfooooo joins
09:34:38tech_exorcist (tech_exorcist) joins
09:38:26systwi__ (systwi) joins
09:40:04systwi quits [Ping timeout: 240 seconds]
09:40:48Atom-- quits [Ping timeout: 255 seconds]
09:43:23Atom joins
09:54:59akaibu quits [Client Quit]
12:52:56fishingforpie joins
12:53:06<fishingforpie>Hello!
12:53:19<fishingforpie>Is there any way to archive the entirety of a site?
12:53:31<fishingforpie>The site in question is https://gbuc.net.
12:53:52<fishingforpie>Err, http://gbuc.net
12:54:32<thuban>fishingforpie: generally, yes; we use archivebot for that and it looks like it would work well on that site.
12:54:40<thuban>do you have some reason to think it's endangered?
12:55:04<fishingforpie>Well, it's very old. So old in fact that the old player is HTML5 and the new player is Flash.
12:55:21<fishingforpie>Many of the songs and artists on there, are only on there.
12:56:26<fishingforpie>It's not particularly endangered, but it has seen better times.
12:56:45<theblazehen|m>Yeah it sure does look ancient...
12:56:57<fishingforpie>It's moreso outdated than endangered. It straight up collapses when you use it with HTTPS.
12:57:06<thuban>oh hm, let me see if archivebot would actually be able to get the media
12:57:17<fishingforpie>It's all hosted on the site.
13:00:54<fishingforpie>It also appears that some of the artwork from some older pages got wiped a year and a half ago.
13:01:00<fishingforpie>http://gbuc.net/modules/bluesbb/thread.php?top=8&thr=501&sty=1&num=l50#p3314
13:02:19<fishingforpie>Don't worry, just the artwork, the audio files (which sometimes were tagged with the artwork) was fine.
13:02:44<thuban>the html player is a javascript 'link', not a real one, so archivebot wouldn't be able to get it, but if we spidered the site with archivebot we could derive player pages from the results, and feeding that list back into ab would get the audio
13:03:59<thuban>fishingforpie: can you describe the structure of the site? the urls refer to 'myalbum', but it looks like they're actually all individual tracks; is that right?
13:04:06<fishingforpie>I see.
13:04:26<fishingforpie>Yes, it's displayed in individual tracks I believe.
13:04:53<fishingforpie>By album, they may imply a Single Album.
13:05:45<thuban>hm? i don't think i understand
13:06:12<fishingforpie>Single Albums are those releases with one or two tracks.
13:08:00<fishingforpie>Album art is typically stored at http://gbuc.net/modules/myalbum/photos/number.extension (Usually PNG, I think I have seen it as JPG before).
13:10:12<fishingforpie>I can't describe it well.
13:10:31<thuban>ok, i think i got it
13:10:50<thuban>photos are embedded directly in the track info pages and should not require any special processing
13:12:00<fishingforpie>Ah.
13:13:13<fishingforpie>The site also appears to have blogs (rebranded forums it appears) and gbUctube, where there are music videos available.
13:13:45<fishingforpie>gbUctube's player no longer works as far as I am aware however downloading the files works just fine.
13:14:53<fishingforpie>Probably because the player uses Flash, yikes.
13:15:45<thuban>there's an actual forum, too
13:16:32<fishingforpie>Oh right.
13:18:24<thuban>unlike the track pages, the video pages actually have a direct download link, so that also won't require special processing
13:18:50<fishingforpie>Yep.
13:26:39<fishingforpie>Hopefully it can somewhat at least be backed up! :)
13:34:41<thuban>fishingforpie: i've left a summary in #archivebot; someone will pop it in the queue when appropriate
13:34:51<fishingforpie>#archivebot
13:35:40Arcorann quits [Ping timeout: 240 seconds]
13:36:24<fishingforpie>When appropriate?
13:37:23<thuban>ab had some downtime recently; i think everything's now running again, but we may be limited in capacity / working through a more urgent backlog
13:38:05<fishingforpie>Ah.
14:10:46<@Sanqui>put into archivebot
14:10:57<@Sanqui>I probably won't have time to do the derivation though, but if somebody provides a list of URLs, we can do it
14:13:13janh joins
14:14:24<janh>oh no, webry blog is shutting down on 2022-12-01 https://support.at.webry.info/202209/article_faq_054.html
14:14:43<@Sanqui>janh: please put it on the deathwatch on the wiki
14:24:56<fishingforpie>Also GBUC is primarily a music site by the way.
14:26:34<thuban>Sanqui: thanks! i'll do the derivation when the job finishes (somebody ping me if i forget)
14:27:05<@Sanqui>awesome!
14:27:29<janh>oops, i searched wrong. it was already on there. sorry =_=
14:27:32<@Sanqui>it's going quickly.
14:27:46<@Sanqui>janh: thanks anyway!
14:37:30<TheTechRobo>Question: Why is archivete.am not a redirect to wiki.archiveteam.org ?
14:41:27mrfooooo6 joins
14:41:27dm4v quits [Client Quit]
14:41:27mrfooooo quits [Client Quit]
14:41:27qwertyasdfuiopghjkl quits [Remote host closed the connection]
14:41:27mrfooooo6 is now known as mrfooooo
14:41:31dm4v joins
14:41:39qwertyasdfuiopghjkl joins
14:46:58AK quits [Remote host closed the connection]
14:47:50AK (AK) joins
15:55:58march_happy quits [Read error: Connection reset by peer]
15:57:02march_happy (march_happy) joins
16:48:42systwi__ is now known as systwi
16:59:29sonick quits [Client Quit]
17:06:37tech_exorcist quits [Remote host closed the connection]
17:06:57tech_exorcist (tech_exorcist) joins
17:12:07tech_exorcist quits [Remote host closed the connection]
17:12:52tech_exorcist (tech_exorcist) joins
17:17:47<@JAA>Sanqui needs to be burnt at stake for using a URL shortener.
17:17:54<@JAA>:-)
17:19:24march_happy quits [Ping timeout: 268 seconds]
17:20:28<@JAA>And it's under 25% shorter than the shortest proper link to the page. lol
17:20:41<@Sanqui|m>I would deserve it. But if there's a URL shortener with some evidenced longevity, it's the original one
18:36:34tech_exorcist quits [Remote host closed the connection]
18:36:58tech_exorcist (tech_exorcist) joins
19:01:59dm4v quits [Client Quit]
19:01:59qwertyasdfuiopghjkl quits [Remote host closed the connection]
19:02:05dm4v joins
19:02:27qwertyasdfuiopghjkl joins
19:53:11tech_exorcist quits [Remote host closed the connection]
19:53:41tech_exorcist (tech_exorcist) joins
20:23:24mut4ntmonkey quits [Ping timeout: 255 seconds]
20:32:47Hackerpcs quits [Quit: Hackerpcs]
20:34:55Hackerpcs (Hackerpcs) joins
20:36:03mut4ntmonkey (mutantmonkey) joins
21:00:23user_ quits [Remote host closed the connection]
21:00:34gazorpazorp (gazorpazorp) joins
21:02:03systwi_ quits [Quit: systwi_]
21:02:48systwi_ joins
21:50:17march_happy (march_happy) joins
22:00:08<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49130&oldid=49121
22:05:16tzt quits [Ping timeout: 240 seconds]
22:14:36katocala quits [Remote host closed the connection]
22:26:40BlueMaxima joins
22:45:55BlueMaxima quits [Read error: Connection reset by peer]
22:49:16Overlordz joins
22:58:43second (second) joins
23:01:17sec^nd quits [Remote host closed the connection]
23:01:17second is now known as sec^nd
23:02:28Arcorann (Arcorann) joins
23:18:46BlueMaxima joins
23:22:04apache2_ quits [Ping timeout: 240 seconds]
23:53:26<h2ibot>Pokechu22 edited ArchiveBot (-383, remove broken ninjawedding.org URLs in favor of…): https://wiki.archiveteam.org/?diff=49131&oldid=48627
23:57:45dm4v_ joins
23:58:00qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
23:58:26Lord_Nightmare2 (Lord_Nightmare) joins
23:58:26systwi_ quits [Client Quit]
23:58:26Lord_Nightmare quits [Client Quit]
23:58:27dm4v quits [Client Quit]
23:58:27dm4v_ is now known as dm4v
23:58:27Lord_Nightmare2 is now known as Lord_Nightmare
23:59:28systwi_ joins