00:00:54decky_e_ quits [Read error: Connection reset by peer]
00:01:19decky_e_ joins
00:03:54kiryu joins
00:03:54kiryu quits [Changing host]
00:03:54kiryu (kiryu) joins
00:48:46jasons (jasons) joins
01:05:42<pabs>only a thousand?
01:05:43<pabs>$ wc -l todo/archive* | tail -n1
01:05:44<pabs> 43585 total
01:06:05<pabs>includes stuff for SWH/codearchiver tho :)
01:10:40BlueMaxima quits [Read error: Connection reset by peer]
01:15:20ScenarioPlanet quits [Ping timeout: 240 seconds]
01:15:20Pedrosso quits [Ping timeout: 240 seconds]
01:15:50TheTechRobo quits [Ping timeout: 240 seconds]
01:16:22Pedrosso joins
01:16:26ScenarioPlanet (ScenarioPlanet) joins
01:16:44TheTechRobo (TheTechRobo) joins
01:24:35<project10>... wtf
01:26:12xarph joins
01:51:30<fireonlive>wat
01:51:32<pabs>mostly proactive stuff, but doesn't count some things in a pad
01:53:00<pabs>and a few thousand unprocessed tabs and a few thousand unopened #hackernews/etc URLs
01:53:31<fireonlive>and now there's #infosec :D
01:53:32<@JAA>lol
01:54:18pseudorizer quits [Client Quit]
01:56:38pseudorizer (pseudorizer) joins
01:58:43<pabs>oh, and of course now I'm monitoring AB for code/forges/repos, so thousands of links there, plus I lost a bunch since its just terminal output and OOMs kill that
02:03:08qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
02:10:01xarph quits [Ping timeout: 272 seconds]
02:29:43qwertyasdfuiopghjkl quits [Remote host closed the connection]
02:48:30qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
02:51:07jasons quits [Client Quit]
02:51:50<h2ibot>JustAnotherArchivist edited Deathwatch (+0, /* 2024 */ Fix order): https://wiki.archiveteam.org/?diff=51649&oldid=51645
03:08:34Megame quits [Client Quit]
03:13:06kiryu_ joins
03:16:31kiryu quits [Ping timeout: 272 seconds]
03:18:01kiryu__ joins
03:21:20kiryu_ quits [Ping timeout: 240 seconds]
03:33:50prikaza quits [Ping timeout: 240 seconds]
03:35:20prikaza (prikaza) joins
03:36:25eightthree quits [Remote host closed the connection]
03:47:20icedice quits [Ping timeout: 240 seconds]
04:13:24<@Fusl>have some old dvd's i found, would like to get those copied with the exact bitstream as on the dvd itself. `dvdbackup` fails to read some blocks and i'd like to do something like i can do with `ddrescue` where i can do multiple passes. does someone here have experience doing that kinda stuff?
04:17:55<@Fusl>also unsure if here or #archiveteam-ot is a better place for this question
04:22:00<pokechu22>I personally use https://github.com/SabreTools/MPF since that's what's used for redump.org, but that's more targetted at games (I've used it for some dvd-video magazine coverdiscs though)
04:23:04Hackerpcs quits [Client Quit]
04:23:22<@JAA>My srad.jp qwarc run finished. I had previously mostly posted in #archivebot since that's where the discussion originally happened, so quick summary: I fetched /comment/ID, /submission/ID, and the 'parent' of comments. Some of the latter were also run through AB before the announcement that the service would live on (for now).
04:25:06Hackerpcs (Hackerpcs) joins
04:25:34<@JAA>There were some errors, as expected on an aging, slow, partially broken site. By and large, it went okay though.
04:26:09<@JAA>958 items had errors; if anyone wants to investigate those, let me know.
04:26:15<@JAA>pabs: ^
04:26:34<flashfire42>Fusl Redumper
04:27:27<@Fusl>oh, that looks more promising
04:27:28<@JAA>Posts (stories, journals, etc.) can't directly be enumerated as far as I could tell, that's why I went via comments. But I suppose posts without comments are not as important, and we do have the recursive AB crawl.
04:31:25<pabs>JAA: I'll look at those errors
04:33:00<@JAA>pabs: Ok, here they are: https://transfer.archivete.am/inline/oXGDP/srad.jp-errors.txt
04:33:06<@JAA>Should be mostly self-explanatory.
04:33:37<pabs>some of them work fine in browser at least, so maybe just !ao <
04:34:13<pabs>comment/1 still gives an error
04:48:15<pabs>btw, final announcement of srad.jp continuation makes it sound like the site will stay up *for now* but there will be no new stories https://srad.jp/story/24/01/31/1253207/
04:48:46<pabs>and indeed no new stories in feb so far
04:49:06<pabs>"the policy was suddenly changed and they decided to start recruiting new hosts without shutting down the servers."
04:51:41<pabs>JAA: for the comments, looks like most give 500 Internal Server Error, one 404, but a few 200 OK
04:51:49<pabs>(from AB)
05:04:43<fireonlive>no new stories... but maybe new bugfixes? :D
05:06:13<nicolas17>Fusl: if they are DRM'd DVD-Video, it's possible that some data required to decrypt the files is stored in a special area and not accessible to a plain dd
05:07:08<nicolas17>after all they had to prevent a plain bitwise copy from giving you a working DVD
05:09:33<nicolas17>JAA: I have a problem... try the "Announcement" link and download the PDF https://opensource.samsung.com/uploadList?menuItem=mobile&searchValue=Dolfin-Browser_v2.0_OpenSource.zip
05:14:06<nicolas17>it seems to be encrypted/DRM'd... and IA doesn't even let me upload it
05:17:20lflare quits [Ping timeout: 240 seconds]
05:20:07<fireonlive>00000000 3c 21 2d 2d 20 49 4e 43 4f 50 53 20 53 45 43 55 |<!-- INCOPS SECU|
05:20:07<fireonlive>00000010 2d 44 52 4d 20 2d 20 56 65 72 20 31 2e 30 20 2d |-DRM - Ver 1.0 -|
05:20:07<fireonlive>00000020 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 3e ac 12 02 00 24 |---------->....$|
05:20:08<fireonlive>o_O
05:20:35<fireonlive>for *checks notes* Dolfin-Browser_v2.0_OpenSource.zip's announcement pdf
05:21:48<nicolas17>in fact
05:21:49<nicolas17>https://opensource.samsung.com/uploadList?menuItem=mobile&searchValue=Dolfin-Browser
05:22:17<nicolas17>2.2 is also DRM'd... with a *different tool*?!
05:24:41<fireonlive>"INCOPS SECU-DRM" seems to be mentioned on some... russian hard drive forums?
05:25:01<fireonlive>(or sections of russian forums)
05:25:13<@Fusl>nicolas17: they're self-made dvds, no drm or anything, just plain old dvd-r
05:27:50nic9070 quits [Quit: The Lounge - https://thelounge.chat]
05:28:07<fireonlive>i think it's https://en.fasoo.com/strategies/enterprise-drm/ (via fasso mentioned on https://wasm.in/threads/incops-secu-drm-ver-1-0.20731/) iow they accidentally uploaded the encrypted version of the file
05:28:18<fireonlive>(or didn't disable their drm thing for that file)
05:28:28<fireonlive>weird lol
05:28:44<nicolas17>Fusl: basic use of ddrescue is "ddrescue /dev/cdrom image.iso image.map", the image.map file is optional but highly recommended, ddrescue will save state there (what was recovered, what failed and is pending retry, etc) letting you resume where it left off later
05:32:44nic9070 (nic) joins
05:36:36lflare (lflare) joins
05:54:52JohnnyJ joins
05:57:23<h2ibot>Usernam edited List of websites excluded from the Wayback Machine/Partial exclusions (+63): https://wiki.archiveteam.org/?diff=51650&oldid=51612
06:05:48Wohlstand (Wohlstand) joins
06:10:47JohnnyJ quits [Read error: Connection reset by peer]
06:11:26JohnnyJ joins
06:15:56<pabs>https://slate.com/technology/2024/02/quora-what-happened-ai-decline.html
06:16:55JohnnyJ quits [Client Quit]
06:17:12JohnnyJ joins
06:17:52<audrooku|m>Quora has always been hot liquid garbage
06:19:12JohnnyJ quits [Read error: Connection reset by peer]
06:19:54JohnnyJ joins
06:26:42JohnnyJ quits [Read error: Connection reset by peer]
06:27:22JohnnyJ joins
06:31:22qwertyasdfuiopghjkl quits [Remote host closed the connection]
06:55:58qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:03:12JohnnyJ quits [Client Quit]
07:05:07<@Fusl>nicolas17: i tried ddrescue, it keeps telling me `ddrescue: /dev/sr0: Unaligned read error. Is sector size correct?` and then makes the dvd drive sad, telling me there is no disk inserted anymore and refusing to eject the disk when i try to
07:15:43Arcorann (Arcorann) joins
07:26:01kiryu__ quits [Read error: Connection reset by peer]
07:27:36kiryu__ joins
07:34:22Dango360 (Dango360) joins
07:53:02kiryu__ quits [Read error: Connection reset by peer]
07:56:34kiryu__ joins
08:45:18<@Fusl>i was able to get ddrescue to work to dump all those dvds, then used dvdbackup to extract from the image file and then ffmpeg to convert them into mp4's
08:53:03<h2ibot>CreaZyp154 edited URLTeam/Warrior (-81, Removed gray background for is.gd and v.gd as…): https://wiki.archiveteam.org/?diff=51651&oldid=51582
09:23:09<h2ibot>CreaZyp154 edited URLTeam (+154, play.st and playst.cc): https://wiki.archiveteam.org/?diff=51652&oldid=51584
09:24:49Island quits [Read error: Connection reset by peer]
09:29:10<h2ibot>CreaZyp154 edited URLTeam (+54, ubi.li): https://wiki.archiveteam.org/?diff=51653&oldid=51652
09:33:50eightthree joins
09:37:12qwertyasdfuiopghjkl quits [Remote host closed the connection]
09:39:45parfait (kdqep) joins
09:41:35parfait_ quits [Ping timeout: 272 seconds]
10:07:24dxrt joins
10:07:26dxrt quits [Changing host]
10:07:26dxrt (dxrt) joins
10:07:26@ChanServ sets mode: +o dxrt
10:08:11@rewby quits [Ping timeout: 272 seconds]
10:09:40sonick quits [Client Quit]
10:44:05darknavi joins
11:29:42bf_ joins
11:33:13rewby (rewby) joins
11:33:13@ChanServ sets mode: +o rewby
11:37:31nulldata5 (nulldata) joins
11:40:01nulldata quits [Ping timeout: 272 seconds]
11:40:01nulldata5 is now known as nulldata
12:30:09Wohlstand quits [Client Quit]
12:37:39Arcorann quits [Ping timeout: 272 seconds]
12:46:10prikaza quits [Remote host closed the connection]
12:47:32prikaza (prikaza) joins
12:53:48prikaza quits [Remote host closed the connection]
12:54:15prikaza (prikaza) joins
12:54:29prikaza quits [Remote host closed the connection]
12:55:04prikaza (prikaza) joins
12:58:09icedice (icedice) joins
13:13:45darknavi quits [Remote host closed the connection]
13:41:54kiryu_ joins
13:44:20kiryu__ quits [Ping timeout: 240 seconds]
13:46:08kiryu__ joins
13:48:33qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
13:49:50kiryu_ quits [Ping timeout: 240 seconds]
14:14:03bocci (bocci) joins
14:47:32darknavi joins
14:58:15HP_Archivist quits [Ping timeout: 272 seconds]
15:27:06<aninternettroll>hi, a bit offtopic, but does anyone know if on the wayback machine i can see a log of latest urls saved for a given domain? I saved a url today, but i forgot it
15:27:46<aninternettroll>and it's not on https://web.archive.org/web/*/samarbeid.digdir.no* under URLs as far as i can tell
15:37:58Megame (Megame) joins
15:44:57HP_Archivist (HP_Archivist) joins
15:48:59Wohlstand (Wohlstand) joins
15:53:11<@JAA>nicolas17: Ew, fun... Yeah, IA has some measures against that. Might be worth contacting Samsung about it via https://opensource.samsung.com/requestInquiry after ensuring everything else is covered.
16:00:47<@JAA>aninternettroll: I think the prefix search might use a different index, so maybe check again in a day or two.
16:00:48HP_Archivist quits [Read error: Connection reset by peer]
16:01:10<aninternettroll>ok, thanks
16:06:16HP_Archivist (HP_Archivist) joins
16:08:39<that_lurker>https://arstechnica.com/gadgets/2024/02/google-search-kills-off-cached-webpages
16:08:42<that_lurker>o7
16:26:56maxfan8_ quits [Quit: WeeChat 3.3]
16:27:20andrew quits [Ping timeout: 240 seconds]
16:28:18<@arkiver>nicolas17: on IA problems, please ping me as well
16:28:22<@arkiver>what is the problem?
16:29:20<Darken>Could someone archive https://sites.google.com/view/crwfam with archivebot for me, has no coverage at all
16:30:34AlsoHP_Archivist joins
16:30:53andrew (andrew) joins
16:33:09maxfan8 (maxfan8) joins
16:34:20HP_Archivist quits [Ping timeout: 240 seconds]
16:42:25<@JAA>(Done)
16:44:18Wohlstand quits [Client Quit]
16:47:20AlsoHP_Archivist quits [Ping timeout: 240 seconds]
16:57:52nic9070 quits [Client Quit]
17:00:20nic9070 (nic) joins
17:31:24HP_Archivist (HP_Archivist) joins
17:39:50icedice quits [Client Quit]
18:02:10bocci quits [Read error: Connection reset by peer]
18:02:26bocci (bocci) joins
18:30:50bocci quits [Ping timeout: 240 seconds]
18:33:52bocci (bocci) joins
18:38:38Wohlstand (Wohlstand) joins
18:41:49rktk quits [Quit: ZNC - https://znc.in]
18:43:10rktk (rktk) joins
18:49:10<@arkiver>JAA: i believe AB (used to)? processes sitemaps
18:49:28<@JAA>arkiver: Yes
18:49:31<@arkiver>for the cinezen.hk job, it did not find the sitemaps under sitemap.xml it seems (it was not listed in robots.txt, but pretty obvious)
18:49:55<@JAA>It tries /sitemap.xml and extracts the ones from /robots.txt, too.
18:50:15<@arkiver>is the CDATA stuff not supported maybe? https://cinezen.hk/sitemap.xml
18:50:38<@JAA>That seems plausible.
18:50:48<@arkiver>right
18:51:06<@arkiver>do i !a < list with the URLs from the sitemaps manually to handle this?
18:53:13<@JAA>Possibly, !a < has quirks. If none of the URLs have any additional path segment, it should be fine.
18:53:40<@JAA>If you have a list, I can check before submission.
18:56:20<pokechu22>ArchiveBot didn't handle https://ann-britt.se/sitemap.xml correctly due to whitespace before/after the loc elements (but the sub-sitemaps didn't have that issue so I could do an !a < list of those)
18:56:59<pokechu22>I'm pretty sure archivebot's fine with CDATA - the issue there is instead that https://cinezen.hk/sitemap.xml links to https://www.cinezen.hk/addl-sitemap.xml and www is considered offsite from non-www
18:57:27<pokechu22>so !a https://www.cinezen.hk/ should work properly as then https://www.cinezen.hk/sitemap.xml is used and that links to www etc
18:58:08darknavi quits [Remote host closed the connection]
18:58:25<pokechu22>(https://cinezen.hk/ also redirects to https://www.cinezen.hk/ so !a https://cinezen.hk/ wouldn't recurse more than 1 level either)
18:59:16<@arkiver>pokechu22: very interesting, thank you
18:59:38<@arkiver>pokechu22: i put it in
19:01:19<@arkiver>pokechu22: it's working! thank you :)
19:04:44<@JAA>Oh, yeah, that makes sense. :-)
19:06:33<pokechu22>It's confusing because generally https://cinezen.hk/sitemap.xml would also redirect to https://www.cinezen.hk/sitemap.xml but in this case it didn't
19:07:39Lord_Nightmare quits [Client Quit]
19:11:22Lord_Nightmare (Lord_Nightmare) joins
19:13:40<@JAA>The lack of redirection actually makes the issue a bit more obvious though than it would be otherwise.
19:16:58nothere quits [Quit: Leaving]
19:22:19Wohlstand quits [Client Quit]
19:22:47Island joins
19:32:43sepro quits [Quit: Ping timeout (120 seconds)]
19:32:54sepro (sepro) joins
19:35:48SootBector quits [Ping timeout: 255 seconds]
19:38:05SootBector (SootBector) joins
19:42:57SootBector quits [Remote host closed the connection]
19:43:24SootBector (SootBector) joins
19:48:11SootBector quits [Remote host closed the connection]
19:48:41SootBector (SootBector) joins
19:50:01nothere joins
19:52:00SootBector quits [Remote host closed the connection]
19:52:24SootBector (SootBector) joins
19:58:23sepro6 (sepro) joins
20:00:20sepro quits [Ping timeout: 240 seconds]
20:00:20sepro6 is now known as sepro
20:04:00Megame1_ (Megame) joins
20:06:50Megame quits [Ping timeout: 240 seconds]
20:07:13Megame1_ is now known as Megame
20:18:48nic9070 quits [Client Quit]
20:32:32BlueMaxima joins
20:46:52Megame quits [Read error: Connection reset by peer]
20:50:48fishingforsoup quits [Read error: Connection reset by peer]
20:51:30fishingforsoup joins
20:52:22@dxrt quits [Client Quit]
20:52:40dxrt joins
20:52:42dxrt quits [Changing host]
20:52:43dxrt (dxrt) joins
20:52:43@ChanServ sets mode: +o dxrt
21:05:56lizardexile joins
21:11:20lizardexile quits [Client Quit]
21:37:54jasons (jasons) joins
21:41:10bf_ quits [Remote host closed the connection]
21:43:33<h2ibot>Pokechu22 edited Jira (+27, /* Not yet archived */ https://issues.redhat.com/): https://wiki.archiveteam.org/?diff=51654&oldid=51635
21:44:34<h2ibot>Pokechu22 edited Jira (-34, /* Not yet archived */…): https://wiki.archiveteam.org/?diff=51655&oldid=51654
21:45:41nic9070 (nic) joins
21:46:07nicolas17 quits [Ping timeout: 272 seconds]
22:00:41ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
22:00:49ThetaDev joins
22:04:37icedice (icedice) joins
22:17:16nicolas17 joins
22:19:07<nicolas17>arkiver: a pdf from opensource.samsung is DRM'd for some strange reason, and IA seems to check if .pdf files have valid format, so it's not letting me upload it
22:20:24<nicolas17>obviously that file is useless as-is
22:21:52<nicolas17>but if imgur gives us a corrupted .png, that doesn't stop us from preserving it, right? :P
22:33:42<h2ibot>Pedrosso edited Steam (+725, Added WIP Collapsible Archive Table for the…): https://wiki.archiveteam.org/?diff=51656&oldid=51382
22:35:43<h2ibot>Pedrosso edited Steam (+0, /* Archives */ Changed GetDetails to QueryFiles): https://wiki.archiveteam.org/?diff=51657&oldid=51656
22:38:41jasons quits [Ping timeout: 272 seconds]
22:45:44<h2ibot>DigitalDragon edited Current Projects (+147, add Vbox7): https://wiki.archiveteam.org/?diff=51658&oldid=51458
22:45:54bocci quits [Remote host closed the connection]
22:54:20BlueMaxima quits [Read error: Connection reset by peer]
22:57:05prikaza quits [Remote host closed the connection]
23:00:27pi31415 joins
23:01:33<pi31415>I found a site that i would like to archive, and it uses OpenSeadragon to present a ginormous zoomable image. Any wisdom on archiving that?
23:04:18<pokechu22>https://openseadragon.github.io/example-images/duomo/duomo.dzi - hmm, this is also http://schemas.microsoft.com/deepzoom/2008
23:04:45<pokechu22>I wrote some horrible code that handles that when I was doing http://chinesepainting.seattleartmuseum.org/OSCI/ I think
23:08:31<pi31415>The one i am looking at has Size Height="126976" Width="204800"
23:08:34<pokechu22>https://transfer.archivete.am/GOq6k/make_url_list.py - be warned that it's *really* bad code and you'll probably need to modify it to work (among other things it assumes the tile size is 256, while the sample I linked is 254 for some reason, and this also handles stuff other than add_accession_number)
23:08:34<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/GOq6k/make_url_list.py
23:08:54<pi31415>Thanks!
23:11:42<pi31415>this site has auth credentials in the .js code that makes requests to the data
23:13:57nic9070 quits [Client Quit]
23:14:49<h2ibot>Arkiver uploaded File:Vbox7 icon.png: https://wiki.archiveteam.org/?title=File%3AVbox7%20icon.png
23:15:49<pi31415>hum, looks like i can just get a directory listing of the tile files from http :-)
23:16:22<pokechu22>Yeah, that's probably easier :)
23:22:24<pi31415>looks like the auth credentials are for a FileMaker backed REST API looks up a location in the image keyed on meta-data
23:22:57<pi31415>guess i am SOL regarding that meta-data
23:27:45sonick (sonick) joins
23:31:36pi31415 quits [Client Quit]
23:33:26pi31415 joins
23:41:44jasons (jasons) joins
23:41:52Sluggs_ is now known as Sluggs
23:55:56<h2ibot>Pedrosso edited Steam (+283026, /* Archives */ Added all other steam workshops…): https://wiki.archiveteam.org/?diff=51660&oldid=51657
23:59:01pi31415 quits [Remote host closed the connection]
23:59:48<@JAA>wat