00:23:23wickedplayer494 quits [Remote host closed the connection]
00:27:18damianwebster quits [Remote host closed the connection]
00:28:08wickedplayer494 joins
00:34:04britmob quits [Ping timeout: 258 seconds]
00:41:44britmob joins
00:44:27<namspc>Anyone in here doing anything where being able to prove the time of the grab is important? I'd be happy to help.
00:45:04<@JAA>Not really. Proving integrity would be important, but that's impossible.
00:57:56Mateon2 joins
00:59:30Mateon1 quits [Ping timeout: 250 seconds]
00:59:30Mateon2 is now known as Mateon1
01:01:55<namspc>JAA, You mean proving that what was grabbed is what was being served?
01:02:03<namspc>And you haven't modified it/etc?
01:02:06<@JAA>Yeah
01:02:20<namspc>Mm, it's not impossible per se. But it is more difficult.
01:02:29<namspc>In general the best way to 'prove' that is with witnesses.
01:02:42<namspc>Or, more concretely, you have multiple people grab it and sign the grab.
01:02:43<@JAA>Nope, it's impossible without the origin server cooperating in some way.
01:02:58<namspc>But obviously witnesses can lie.
01:03:05dm4v_ joins
01:03:06<@JAA>Yeah, we often don't have enough time to grab everything once, so that doesn't work.
01:03:10<namspc>Yeah.
01:03:16<namspc>I'm just talking about the general case there.
01:03:22<namspc>For ArchiveTeam's specific use case, not helpful.
01:03:24dm4v quits [Ping timeout: 250 seconds]
01:03:24dm4v_ is now known as dm4v
01:03:24dm4v quits [Changing host]
01:03:24dm4v (dm4v) joins
01:03:52<namspc>One thing that timestamping can do is at least show that you haven't tampered with it past X date, which can be useful in establishing that "well if there was tampering it had to have occurred within this timeframe".
01:04:11<namspc>Which can be useful approximate proof that you haven't been messing with it.
01:04:34<namspc>Nobody can go back and slip things in years after a grab, etc.
01:07:40<namspc>And, it hadn't occurred to me before but I could actually just run a bot that goes through items on archive.org and timestamps them, then upload the proof collections to archive.org itself.
01:08:15<namspc>Like I was initially thinking if I wanted to do that I had to ask someone at the archive to help, but I actually don't. So I should probably do that.
01:09:14<namspc>Is there an easy machine readable way to get the hashes of collection items, like an API or something?
01:10:56<namspc>In principle they have to exist since torrents are being offered.
01:12:21<thuban>they're in the _files.xml metadata file (click 'show all' under 'download options' on the details page).
01:12:48<thuban>there's an api: https://archive.org/services/docs/api/metadata.html
01:13:35<namspc>Hm, is SHA-1 still secure for files? I know it's generally deprecated for passwords, but those are short.
01:15:32<@JAA>Even MD5 and MD4 are fine. I'm not even sure whether MD2 is broken yet in that regard.
01:15:43<@JAA>Preimage attacks are hard.
01:15:47namspc nods
01:15:52<namspc>Sounds perfect then, thanks.
01:18:18<namspc>https://www.goanywhere.com/blog/2017/03/02/still-using-sha-1-to-secure-file-transfers-its-time-to-say-goodbye
01:18:19<namspc>Hm.
01:19:03<namspc>https://www.computerworld.com/article/3173616/the-sha1-hash-function-is-now-completely-unsafe.html
01:19:26<@JAA>Yes, a lot of people don't understand the difference between collision and preimage resistance.
01:19:42<@JAA>Anyway, this discussion belongs in -ot.
01:27:52Arcorann (Arcorann) joins
01:41:24<Somebody2>namspc: Check out "Internet Archive Census" on the archiveteam wiki -- that's where the previous efforts at this was documented.
01:41:37<Somebody2>and I'm DELIGHTED you are making another effort at it.
01:58:23<namspc>Somebody2, I'm in OT now, we can discuss it there.
02:16:58summerisle quits [Quit: In my vision, I was on the veranda of a vast estate, a palazzo of some fantastic proportion.]
02:17:05summerisle (summerisle) joins
02:30:36Arcorann quits [Ping timeout: 258 seconds]
02:48:27pawbs quits [Quit: My ZNC server died. Probably updating my kernel...]
02:53:06pawbs joins
03:03:25etnguyen03 quits [Client Quit]
03:21:13Sylirana quits [Ping timeout: 244 seconds]
03:21:48qw3rty_ joins
03:22:12Sylirana (Sylirana) joins
03:25:25qw3rty__ quits [Ping timeout: 258 seconds]
04:04:16DogsRNice quits [Read error: Connection reset by peer]
04:16:58benjinsmith joins
04:18:42benjins quits [Ping timeout: 258 seconds]
04:28:02Iki1 joins
04:31:21Iki quits [Ping timeout: 258 seconds]
04:36:22HP_Archivist (HP_Archivist) joins
04:36:58Lord_Nightmare quits [Client Quit]
04:37:25Lord_Nightmare (Lord_Nightmare) joins
04:39:12tzt quits [Ping timeout: 250 seconds]
04:40:06tzt joins
04:42:28aleph quits [Ping timeout: 258 seconds]
04:44:00LordThanatos quits [Ping timeout: 258 seconds]
04:44:09aleph joins
04:45:03pcr leaves
04:47:36LordThanatos joins
05:29:44BlueMaxima quits [Client Quit]
05:31:37Eighty quits [Quit: leaving]
05:39:04Eighty (Eighty) joins
05:45:56dm4v quits [Read error: Connection reset by peer]
06:07:27pcr joins
06:51:39HP_Archivist quits [Ping timeout: 258 seconds]
07:49:31sec^nd quits [Remote host closed the connection]
07:49:56sec^nd (second) joins
08:16:48Atom-- joins
08:17:36Atom quits [Ping timeout: 250 seconds]
08:56:53bobbyb quits [Remote host closed the connection]
09:36:14<@OrIdow6>arkiver: Get a reply on that wiki thing?
10:52:46Iki1 quits [Ping timeout: 258 seconds]
11:15:41Iki1 joins
11:46:49@OrIdow6 quits [Ping timeout: 258 seconds]
11:50:25OrIdow6 (OrIdow6) joins
11:50:25@ChanServ sets mode: +o OrIdow6
12:03:41kiskaWeebChat quits [Ping timeout: 258 seconds]
12:21:04benjinsmith is now known as benjins
12:24:29<@arkiver>OrIdow6: one - they said there is not much time left, there 1.2 million sites, and mostly without traffix
12:24:34<@arkiver>replied
12:24:38<@arkiver>nothing more
12:32:38BlueMaxima joins
13:08:05BlueMaxima quits [Client Quit]
13:11:42marked quits [Quit: Ping timeout (120 seconds)]
13:14:18marked joins
13:24:59driib joins
15:09:16monoxane quits [Ping timeout: 250 seconds]
15:14:42monoxane (monoxane) joins
15:18:48ThreeHeadedMonkey quits [Ping timeout: 250 seconds]
15:32:23ThreeHeadedMonkey (ThreeHeadedMonkey) joins
15:56:50HackMii (hacktheplanet) joins
15:58:26HackMii_ quits [Remote host closed the connection]
16:05:57dewdrop quits [Ping timeout: 258 seconds]
16:07:27dewdrop (dewdrop) joins
16:35:54godane (godane) joins
16:45:02ddd joins
16:57:59Daloader__ joins
17:05:40HP_Archivist (HP_Archivist) joins
17:18:30LeGoupil joins
17:33:46DogsRNice (Webuser299) joins
17:54:49Vukky (Vukky) joins
17:58:06Vukky quits [Client Quit]
18:06:03spirit joins
18:44:39Wayward quits [Ping timeout: 258 seconds]
18:48:44NIC007a83 joins
18:49:37Ryz quits [Remote host closed the connection]
18:51:09Ryz (Ryz) joins
19:22:37ThreeHeadedMonkey is now known as ThreeHM
19:29:30<JensRex>Reposting a question/topic from #archivebot:
19:29:36<JensRex><JensRex> So, I'd like to have this site archived, because as far as I can tell it's not in IA: http://odensebilleder.dk/billedliste.asp
19:29:46<JensRex><JensRex> But...
19:29:51<JensRex><JensRex> It has to set aome stupid session cookie from this page first: http://odensebilleder.dk/billedstart.asp
19:29:57<JensRex><JensRex> Just pressing "Søg" (search) returns all images in the archive.
19:30:02<JensRex><JensRex> Don't know if that's something archivebot is able to handle.
19:30:21<JensRex><JensRex> This site was not made by competent people.
19:30:34<JensRex><JensRex> It’s ~17k historic images of the city Odense. The third largest in Denmark. These people are barely able to write parseable html, who knows what their backup policy is like.
19:34:37ddd quits [Remote host closed the connection]
19:59:32ddd joins
20:23:52<Sylirana>JensRex, I can't speak for archivebot or adding the site to IA, but if you just want the images, it's as simple as iterating over image.service.museum.odense.dk/$ID and there doesn't seem to be any cookie checking there.
20:24:35HP_Archivist quits [Read error: Connection reset by peer]
20:25:03HP_Archivist (HP_Archivist) joins
20:26:52<Sylirana>Something I've noticed: IDs below 128 exist on http://image.service.museum.odense.dk/, but not anywhere on http://odensebilleder.dk/ for some reason.
20:28:16ddd quits [Remote host closed the connection]
20:29:04<JensRex>Yea, I considered crawling it for myself over the weekend.
20:30:23<thuban>unfortunately, the pages (http://odensebilleder.dk/billedvis.asp?billednr=$ID) contain valuable metadata and do have the cookie check. i don't think archivebot can accept cookie configuration at present (see https://github.com/ArchiveTeam/ArchiveBot/issues/416). grab-site can, but grab-site warcs don't get automatically ingested into the wayback machine.
20:31:11<thuban>JAA, do we have anybody whitelisted for wbm who could run this via grab-site?
20:33:08<thuban>Sylirana: that's funny, i don't see any ids below 128. got an example?
20:33:34<Sylirana>http://image.service.museum.odense.dk/ has a bit of information about the image host... Apparently there seem to be different sets there which are accessed through something like http://image.service.museum.odense.dk/FKM/FGV/$ID . But I haven't been able to find another set like "FKM/FGV" anywhere.
20:33:41<Sylirana>http://image.service.museum.odense.dk/100
20:40:56dm4v joins
20:40:58dm4v quits [Changing host]
20:40:58dm4v (dm4v) joins
20:46:32<thuban>huh! 91-108 exist and seem to be archaeological. there are other gaps in the ids returned by this search, some of which don't seem to have a corresponding image, some which do but likewise depict other topics (eg 156 / 944). i wonder whether there are other galleries hiding somewhere... if there's a link somewhere on odensebysmuseer.dk i haven't turned it up
20:52:18HP_Archivist quits [Ping timeout: 258 seconds]
20:53:45<Sylirana>thuban, I can't get 108, are you sure that exists? The first "set" I've found is 88-89 and the next one 91-107.
20:55:56Lilpea quits [Ping timeout: 250 seconds]
20:56:05<thuban>whoops, yes, typo.
20:58:59NIC007a83 quits [Client Quit]
21:02:50KRG` joins
21:03:25KRG quits [Ping timeout: 258 seconds]
21:03:35Lilpea joins
21:11:17LeGoupil quits [Client Quit]
21:22:20BlueMaxima joins
21:33:27nertzy quits [Client Quit]
21:41:22Iki1 quits [Ping timeout: 258 seconds]
21:51:36HP_Archivist (HP_Archivist) joins
21:56:56nertzy (nertzy) joins
21:58:21godane1 joins
21:58:28HP_Archivist quits [Read error: Connection reset by peer]
21:58:39godane quits [Read error: Connection reset by peer]
21:58:55HP_Archivist (HP_Archivist) joins
21:59:04BlueMaxima quits [Client Quit]
21:59:37godane2 joins
22:03:06godane1 quits [Ping timeout: 250 seconds]
22:03:09Aoede quits [Quit: ZNC - https://znc.in]
22:03:29Aoede (Aoede) joins
22:28:45MaxG joins
22:35:02HP_Archivist quits [Ping timeout: 258 seconds]
22:38:03<Sylirana>Other things which are a bit odd about the site: Instead of simply changing the ID parameter, sometimes "flg" is used to go to the next/previous page (and the ID stays the same). And if you change the "antal" parameter to something like 20000, there will be a button for the "next" page on the last image which just leads to an invalid page. I mean it works, but it would also work without those two parameters.
22:38:03jtagcat quits [Quit: Bye!]
22:40:36jtagcat (jtagcat) joins
22:43:48<Sylirana>As for the images, from 0-20000, there are already over 18k image files out of which only 7954 are also on the "main" site, so there are most likely a lot more images than on the main site.
22:56:16godane2 quits [Read error: Connection reset by peer]
22:58:32Iki1 joins
23:18:30Daloader__ quits [Ping timeout: 250 seconds]
23:29:10minus leaves [WeeChat 2.8]