19:05 * marga Margarita Manterola
19:05 * ntyni Niko Tyni
19:06  * smcv Simon McVittie
19:06 * gwolf Gunnar Wolf
19:07 < gwolf> I guess we can feed the logs to the meetbot processor or so...
19:07 * gwolf pingalls ctte..
19:07 <@marga> OdyX said he would likely miss this session...
19:07 <@marga> fil, ?
19:07 <@marga> And the rest of our members seem to not be on the channel...
19:07 < smcv> we've all got so used to the meetings all being cancelled
19:08 <@marga> Alright, let's move to our first topic
19:08 < smcv> it's typical, you wait 3 months for a bus^Wagenda item and three come along at once
19:08 -!- marga changed the topic of #debian-ctte to: #904302 Whether vendor-specific patch series should be permitted in the archive
19:09 <@marga> I think we basically have consensus on this one.
19:09 <@marga> I'm sad that Tollef isn't here, as he seemed to have volunteered to write the proposal for the vote.
19:09 < gwolf> I agree. There are no voices against from my recollection in the list
19:10 <@marga> I think the next steps would be: 1) draft a statement 2) vote on the statement 3) close the issue
19:10 < ntyni> marga's message today pretty much matched my thoughts fwiw, so ack on consensus
19:10 < smcv> marga's mail to the bug says everything I would have said
19:10 <@marga> Thanks :)
19:10 < gwolf> there was one DD arguing against our probable outcome, but I understand someone (fil?) spoke with him at DC18 and basically come to an OK
19:10 <@marga> Thanks smcv for your thorough analysis, I really appreciated that.
19:11 <@marga> Maybe bremner?
19:11 < gwolf> maybe. Don't remember.
19:11 < smcv> hah for a moment I totally forgot that I'd mailed the bug and thought you were endorsing "I agree with marga" as a thorough analysis :-)
19:11 < gwolf> But anyway - I think there is consensus within ctte, and don't think the consensus will be contentious outside of our little group :)
19:12 <@marga> Alright, so, shall we add an action item for Tollef to do the writeup as he seems to have volunteered?
19:12 < ntyni> sure :)
19:12 < gwolf> I'm OK with volunteering Tollef :) In case he cannot or won't do it, I can take the task.
19:12 < smcv> sounds good
19:13 <@marga> #action Mithrandir to draft the resolution so that we can vote on it. gwolf can take it if that's not ok.
19:13 -!- marga changed the topic of #debian-ctte to: #904558 What should happen when maintscripts fail to restart a service
19:13 -!- Mithrandir has joined #debian-ctte
19:14 <@marga> \o/
19:14 < Mithrandir> Hey
19:14 < gwolf> This is way less clear cut IMO... And not having any follow-up kind-of-supports it
19:14 < gwolf> Mithrandir: !!
19:14 < Mithrandir> on phone, so not terribly useful
19:14 <@marga> Mithrandir, we just actioned you in the previous topic:
19:14 <@marga> #action Mithrandir to draft the resolution so that we can vote on it. gwolf can take it if that's not ok.
19:14 < gwolf> Mithrandir: (re:#904302)
19:14 <@marga> (this is the series.vendor issue)
19:14 < Mithrandir> yup
19:14 < Mithrandir> efm
19:14 < Mithrandir> wfm
19:14 <@marga> Awesome.
19:15 < gwolf> great, I'm off the hook ;-)
19:15 < Mithrandir> Does anybody have opinions? I think mine is clear
19:15 <@marga> For this topic, we had less traffic, but I agree with Tollef's mail from last week. i.e. We could recommend good behavior but we shouldn't dictate anything.
19:16 < smcv> I'm curious what you think the good behaviour is
19:16 < ntyni> Mithrandir: in case it's not obvious, we've moved onto #904558 already
19:16 < gwolf> I think Mithrandir's suggestion is clear - But we have to decide what the recommended one would be - Fail open? Fail closed?
19:16 < gwolf> (best is not to fail, of course)
19:16 <@marga> Oh, I do || true on basically all statements I add to a maintscript :)
19:17 <@marga> Failing maintscripts are really a horrible nightmare
19:17 < Mithrandir> ntyni: yup, got that
19:17 < gwolf> marga: the bug is particularly about restarting services, so I guess failing to mkdir or so does not have to follow this.
19:17 <@marga> I know, I'm just saying I might be a bit of an extremist in the "maintscripts shouldn't fail" camp
19:18 < Mithrandir> i kinda lean towards fail early, but balancing that is failing maintscripts tend to leave you at the bottom of a deep hole, so…
19:18 < gwolf> But... There are issues™ in whatever direction we decide to follow. i.e. failing to restart a daemon on a package update will leave the user running a potentially buggy version even having a fixed one installed...
19:18 < smcv> I wonder whether distinguishing between stop;start and "reload harder"/re-exec is useful
19:18 < smcv> otoh systemctl restart always means stop;start
19:19 < gwolf> Yes, failing maintscripts is a terrible headache for users. It often involves editting the maintscript in question, not what I expect just-about-anybody to be able to do
19:19 < Mithrandir> i think being consistent is more important than whether the default is stop or not
19:20 < gwolf> smcv: oh, right - Restart means stop/start. And if the daemon was not running to begin with (due to the user having manually stopped it, say), a restart will leave it running.
19:20 < Mithrandir> gwolf: we expect it to be: fix the daemon startup , dpkg —configure in this case
19:20 < smcv> if the maintscript fails I suppose the key question is what the user is going to do about it
19:20 < Mithrandir> so no maintscript hacking
19:21 < smcv> and whether what they do about it differs from what they'd do if it suppressed the failure and left the package configured-but-non-functional
19:21 < smcv> (with a big fat warning)
19:21 <@marga> One of the problems with failing maintscripts is that it's usually very hard to understand what's failing.  Even if you don't need to edit the maintscript itself, you may need to go read it to understand where it failed.
19:21 <@marga> smcv, where would the warning be visible?
19:21 < gwolf> marga: yes, and that's something that cannot be expected from users in general. Not even knowing where the scripts live
19:21 < Mithrandir> Or just rerun dpkg to get rid of the chaff
19:22 < smcv> marga: insert recurring wish for useful logging here
19:22 <@marga> :)
19:22 < Mithrandir> so, I’m somewhat worried that ignoring failures leads to reboot, then a failed service
19:23 < gwolf> ...I "smell" that this might be too broad of an issue for us to rule
19:23 < smcv> and not ignoring failures leads to what?
19:23 <@marga> Uhm, I'm not sure I follow.  It could also lead to "failed service -> reboot -> working service", depending on the failure
19:23 < smcv> a failed service, a failed dpkg and a confused sysadmin?
19:23 < Mithrandir> in the Debian spirit, can we have a default that is overridable and then we just have to choose the default.
19:24 <@marga> "Maintscripts should try as hard as possible not to fail"?
19:24 < Mithrandir> gwolf: we are asked to advise, not decide.
19:24 < smcv> Mithrandir: only if it doesn't require us to inject yet more shell script complexity into maintscripts
19:24 < gwolf> right
19:24 < gwolf> marga: unset -e
19:24 <@marga> Yeah
19:25 <@marga> "And if they do fail, they should output an actionalbe message to the user"
19:25 < gwolf> ...I would not like that, though...
19:25 < ntyni> it would be nice to have a generic way to 'fail gracefully' and inform the user that a daemon failed to start during configure
19:25 < gwolf> The question is not about the general initscripts flow, but about service restarts
19:26 < ntyni> I'm thinking of something like a dpkg trigger
19:26 < Mithrandir> Can we do something good here without going into detailed design?
19:26 <@marga> gwolf, I'm not sure it's a separate question, really
19:26 < smcv> perhaps relevant:
19:26 < smcv> why is it so important that we restart services?
19:26 < gwolf> marga: it stems/flows from the specific question, right... But we are asked to advise specifically on what happens in service restarts
19:26 < smcv> answer: if they have security vulns then the old version is bad
19:27 < smcv> but if libssl or libldap or libdbus has security vulns then we don't restart the world
19:27 < Mithrandir> smcv: config updates, ensure new version works
19:27 < gwolf> smcv: Also, if there are behavioural changes (i.e. upstream package update), you want the running code to match what you have in disk
19:27 <@marga> The problem is that "the restart operation fails" already has two possible options of failure: stop failed or start failed.
19:28 < Mithrandir> marga: stop failed happens a lot less frequently with systemd though
19:28 <@marga> Yeah, I guess systemd will take care of making it happen one way or the other
19:28 < smcv> service manager in "quite good at stopping services" shock
19:28 < Mithrandir> I need to go again, I’d like to finish this on email, but please do continue the discussion:-)
19:29 < Mithrandir> feel free to highlight me if there is something in particular I can be useful on
19:30 <@marga> So, let's assume that the service did a restart, the stop succeeded but the start failed... What's to gain from the maintscript failing?
19:30 < smcv> so the wider context here is that the submitter of #780403 agrees with marga that things should fail less hard
19:31 < gwolf> marga: right, it's IMO always better to gracefully finish the install than to have a failed maintscript
19:31 < smcv> #780403 is actually about start, not restart, btw
19:31 < gwolf> Even though that will annoy the sysadmin as it results in a dead service
19:32 <@marga> Yeah, it's not that much different anyway
19:32 < smcv> daniel pocock makes an interesting point in the merged bug 802501 (which is about restarts) that if a service is taking down the daemon to do some offline reconfiguration,
19:33 < smcv> it can stop it in the preinst, which (unlike the postinst) can abort installation without leaving the system in an undefined state
19:34 < smcv> I feel as though that's a really rare case though
19:34 <@marga> Yeah, preinst failing is a different story than postinst failing
19:35 < gwolf> and preinst should be quite more limited in scope than postinst fwiw
19:36 <@marga> And really, I personally see no gain in postinst failing.  Is there any gain at all?
19:36 < smcv> stopping in preinst means you have to communicate to the postinst that it's ok to start the thing, though, if you want the overall effect to be like systemctl try-restart (which is "stop, then start if it was previously running")
19:38 < smcv> obligatory controversial opinion: maybe we should be more prepared to require a reboot, and less keen to do surgery while the patient is still awake
19:38 < ntyni> marga: the gain is making sure that the admin notices the daemon failure?
19:38 <@marga> :-/
19:38 <@marga> I guess that shows how bad things are regarding surfacing problems
19:39 < smcv> ntyni: "surprise! your packages are in an undefined state" is not necessarily such a constructive way to signal that?
19:39 < ntyni> it sure isn't
19:39 < gwolf> ntyni: I agree with marga's and smcv's feeling. Of course it grabs the admin's attention. But not for the better!
19:41 < ntyni> I'm not saying it's good practice, just saying that's the only gain I see
19:41 < smcv> sure
19:41 < gwolf> right... But I think we could then unanymously say it's bad to stop in a state the package manager is confused..?
19:42 < gwolf> It's better that the admin notices when foobard is not answering in its usual port...
19:42 < smcv> shouting about it on stderr and in syslog/Journal is always good of course
19:43 <@marga> The more I think about this, the more I think this is a remnant of old times.  We need a better way to communicate to the user that something is not right
19:43 < smcv> systemd seems quite good at making a lot of noise when it can't do what you asked it to
19:43 < ntyni> marga: agreed
19:43 < gwolf> marga: "shouting" is not necessary, but "signaling our init so that it complains" (i.e. "degraded") should be enough
19:44 < smcv> if we assume systemd for a moment
19:44 < smcv> is there a way that daemons *can* fail to restart without it logging that fact?
19:44 <@marga> Not that I'm aware
19:44 < gwolf> Right. I'm talking about my recent experiences :-] But... Logging, stdout, falied return codes upon invocation...
19:44 < smcv> I think it'll always log "systemd[1]: Failed to start whatever."
19:44 < ntyni> maybe we should have an apt hook to run 'systemctl status' after upgrades
19:45 < gwolf> whatever init system you choose, those are the main communication methods
19:45 < ntyni> or something like that
19:45 * gwolf hopes ntyni is joking
19:45 <@marga> It wouldn't help on unattended upgrades, which I expect is the majority of upgrades nowadays
19:45 < ntyni> sure
19:46 < smcv> if your upgrades are unattended then your error reporting also needs to be unattended
19:46 < smcv> logcheck exists
19:46 < smcv> so do nagios and friends
19:46 <@marga> Yup
19:47 < smcv> I'm not sure that fixating on "but what about restarts" is proportionate
19:47 < smcv> daemons can crash any time
19:48 <@marga> This discussion is getting longer than I had originally expected.  It seems to me that we are mostly in agreement that maintscripts failing are generally undesirable, if maybe not in 100% agreement of how undesirable they are... How should we move forward?
19:48 < smcv> it seems like there would be consensus for a statement with some weasel words in it, at least
19:49 * gwolf asks weasel for his words
19:49 < gwolf> marga: I think we have to keep in mind spwhitton's request when he opened this bug...
19:49 < smcv> a service failing to restart should be logged prominently in the system log and the maintainer script's stderr, but should not usually[1] cause the maintainer script to fail, unless there is a really good[2] reason why it must
19:50 <@marga> are 1 and 2 the weasel words?
19:50 < smcv> (this footnote intentionally left blank)
19:50 <@marga> heh
19:50 < gwolf> ...oh - never mind my last sentence - he asks us to _decide_, not to _advise_
19:50 <@marga> Ok, I'll take the action item of writing up something, and send it to the bug.
19:51 < ntyni> he seems to be asking for advice on whether we should decide
19:51 < ntyni> :)
19:51 <@marga> #action Marga to write up a summary of the discussion here and send it to the bug.  Discussion to continue there.
19:51 < gwolf> smcv: I'd go a bit more general than what you suggest - "A service failing to restart should signal the administrator in a prominent but nonintrusive way".?
19:51 < gwolf> (i.e. we don't do design work)
19:51 < smcv> sure, what I have in mind is "what we do now? do that" :-)
19:52 < gwolf> The important part, where we all agree, is that we want dpkg to fail the least possible, and leaving a broken maintscript does not help the user.
19:52 < smcv> yeah
19:52 < ntyni> yes
19:53 < smcv> broken pre*: well if you absolutely must (clue: if you need to ask, you don't)
19:53 < smcv> broken post*: just say no
19:53 < gwolf> :)
19:53 <@marga> :)
19:53 <@marga> Cool, let's move to our third topic
19:53 -!- marga changed the topic of #debian-ctte to: Any interesting things to share from DC18?
19:53 < gwolf> Well, I think it is up to me :)
19:53 <@marga> Yup
19:53 < gwolf> We had our annual ctte bof, which went smoothly but lacked a bit IMO
19:54 < gwolf> (being me the presenter, and presenting slides written by OdyX originally)
19:54 < gwolf> ...There were conversations mainly regarding our two bugs that several of us had, mainly with the submitters, but also with some other interested people
19:54 < gwolf> but all in all, I don't think there's much to report
19:55 < gwolf> ...any questions? :-]
19:55 < ntyni> thank you for handling the bof
19:56 < gwolf> Did any of you follow it?
19:56 < gwolf> Or any IRC conversation that happened during it?
19:56 <@marga> Thanks, I guess that's what I wanted to know.  I haven't actually watched any talks from DebConf
19:57 < ntyni> I did follow it
19:57 < ntyni> the discussion wasn't very lively
19:57 <@marga> :(
19:58 < ntyni> but it was certainly good to have it I think
19:58 < gwolf> nope. Now again, we are at a point where the ctte has been dormant-ish for three months
19:58 < gwolf> I was mostly interested in getting other people to become interested in joining
19:58 < gwolf> but don't think I managed to raise too much enthusiasm
19:58 <@marga> Yeah, that will be a topic for next month's meeting
19:58 <@marga> (we said we would pause recruiting until Sept)
19:59 < gwolf> right
19:59 <@marga> Anyway, we are almost at the hour...
19:59 -!- marga changed the topic of #debian-ctte to: Any additional business?
19:59 < smcv> would it be helpful to have more emphasis on "ask the ctte for advice"?
20:00 < smcv> there are a couple of gnome bugs where my thought is "what would I even do about this"
20:00 < smcv> maybe formally asking the ctte about them would set a good example?
20:01 < gwolf> smcv: could be. We are not drowning in work, as you see ☺ Then again, not every weird issue should be brought to the ctte. Hopefully...
20:02 <@marga> Well, if it's just asking for advice is different than asking for a ruling in a dispute
20:02 < smcv> I'd only consider that for >= important bugs tbh
20:02 <@marga> We've had a bunch of bad experiences last year with disputes going off-track.  We haven't had much advise asking until these two bugs from Sean
20:03 < smcv> yeah that was partly why I thought it might set a good example
20:03 < gwolf> marga: the bad experiences were... modem-manager? and..?
20:03 <@marga> gwolf, uhm... someone else who also orphaned their package rather than engage with the TC...
20:04 < smcv> instead of reserving appeals to the ctte for "I want to overrule this maintainer", try to encourage people to come to the ctte with "I'm considering this workaround for a broken situation but I don't know if I should"
20:04 < gwolf> right. Well, FWIW (and on-the-record, as we are still formally in meeting), I was saddened and surprised to reaed Guillem's answer to me asking for his stance on #904302
20:05 <@marga> Me too and I was going to bring this as a subject to discuss today if we had time, but we are overtime now, so I think I'd rather table it for next month.
20:05 <@marga> smcv, I think it would be a nice experiment
20:06 < ntyni> Guillem has made his opinion about the ctte clear on previous occasions too
20:06 < smcv> ok, will look at summarizing #896019 and/or #888549 into a form I can ask for advice on
20:07 <@marga> Alright, I think that would be endmeeting, unless someone has some urgent matter to bring up?
20:07 * gwolf shuts up
20:07 < ntyni> nothing from me