meetings/20180815/debian-ctte.2018-08-15.log.txt

   1 19:05 * marga Margarita Manterola
   2 19:05 * ntyni Niko Tyni
   3 19:06  * smcv Simon McVittie
   4 19:06 * gwolf Gunnar Wolf
   5 19:07 < gwolf> I guess we can feed the logs to the meetbot processor or so...
   6 19:07 * gwolf pingalls ctte..
   7 19:07 <@marga> OdyX said he would likely miss this session...
   8 19:07 <@marga> fil, ?
   9 19:07 <@marga> And the rest of our members seem to not be on the channel...
  10 19:07 < smcv> we've all got so used to the meetings all being cancelled
  11 19:08 <@marga> Alright, let's move to our first topic
  12 19:08 < smcv> it's typical, you wait 3 months for a bus^Wagenda item and three come along at once
  13 19:08 -!- marga changed the topic of #debian-ctte to: #904302 Whether vendor-specific patch series should be permitted in the archive
  14 19:09 <@marga> I think we basically have consensus on this one.
  15 19:09 <@marga> I'm sad that Tollef isn't here, as he seemed to have volunteered to write the proposal for the vote.
  16 19:09 < gwolf> I agree. There are no voices against from my recollection in the list
  17 19:10 <@marga> I think the next steps would be: 1) draft a statement 2) vote on the statement 3) close the issue
  18 19:10 < ntyni> marga's message today pretty much matched my thoughts fwiw, so ack on consensus
  19 19:10 < smcv> marga's mail to the bug says everything I would have said
  20 19:10 <@marga> Thanks :)
  21 19:10 < gwolf> there was one DD arguing against our probable outcome, but I understand someone (fil?) spoke with him at DC18 and basically come to an OK
  22 19:10 <@marga> Thanks smcv for your thorough analysis, I really appreciated that.
  23 19:11 <@marga> Maybe bremner?
  24 19:11 < gwolf> maybe. Don't remember.
  25 19:11 < smcv> hah for a moment I totally forgot that I'd mailed the bug and thought you were endorsing "I agree with marga" as a thorough analysis :-)
  26 19:11 < gwolf> But anyway - I think there is consensus within ctte, and don't think the consensus will be contentious outside of our little group :)
  27 19:12 <@marga> Alright, so, shall we add an action item for Tollef to do the writeup as he seems to have volunteered?
  28 19:12 < ntyni> sure :)
  29 19:12 < gwolf> I'm OK with volunteering Tollef :) In case he cannot or won't do it, I can take the task.
  30 19:12 < smcv> sounds good
  31 19:13 <@marga> #action Mithrandir to draft the resolution so that we can vote on it. gwolf can take it if that's not ok.
  32 19:13 -!- marga changed the topic of #debian-ctte to: #904558 What should happen when maintscripts fail to restart a service
  33 19:13 -!- Mithrandir has joined #debian-ctte
  34 19:14 <@marga> \o/
  35 19:14 < Mithrandir> Hey
  36 19:14 < gwolf> This is way less clear cut IMO... And not having any follow-up kind-of-supports it
  37 19:14 < gwolf> Mithrandir: !!
  38 19:14 < Mithrandir> on phone, so not terribly useful
  39 19:14 <@marga> Mithrandir, we just actioned you in the previous topic:
  40 19:14 <@marga> #action Mithrandir to draft the resolution so that we can vote on it. gwolf can take it if that's not ok.
  41 19:14 < gwolf> Mithrandir: (re:#904302)
  42 19:14 <@marga> (this is the series.vendor issue)
  43 19:14 < Mithrandir> yup
  44 19:14 < Mithrandir> efm
  45 19:14 < Mithrandir> wfm
  46 19:14 <@marga> Awesome.
  47 19:15 < gwolf> great, I'm off the hook ;-)
  48 19:15 < Mithrandir> Does anybody have opinions? I think mine is clear
  49 19:15 <@marga> For this topic, we had less traffic, but I agree with Tollef's mail from last week. i.e. We could recommend good behavior but we shouldn't dictate anything.
  50 19:16 < smcv> I'm curious what you think the good behaviour is
  51 19:16 < ntyni> Mithrandir: in case it's not obvious, we've moved onto #904558 already
  52 19:16 < gwolf> I think Mithrandir's suggestion is clear - But we have to decide what the recommended one would be - Fail open? Fail closed?
  53 19:16 < gwolf> (best is not to fail, of course)
  54 19:16 <@marga> Oh, I do || true on basically all statements I add to a maintscript :)
  55 19:17 <@marga> Failing maintscripts are really a horrible nightmare
  56 19:17 < Mithrandir> ntyni: yup, got that
  57 19:17 < gwolf> marga: the bug is particularly about restarting services, so I guess failing to mkdir or so does not have to follow this.
  58 19:17 <@marga> I know, I'm just saying I might be a bit of an extremist in the "maintscripts shouldn't fail" camp
  59 19:18 < Mithrandir> i kinda lean towards fail early, but balancing that is failing maintscripts tend to leave you at the bottom of a deep hole, so…
  60 19:18 < gwolf> But... There are issues™ in whatever direction we decide to follow. i.e. failing to restart a daemon on a package update will leave the user running a potentially buggy version even having a fixed one installed...
  61 19:18 < smcv> I wonder whether distinguishing between stop;start and "reload harder"/re-exec is useful
  62 19:18 < smcv> otoh systemctl restart always means stop;start
  63 19:19 < gwolf> Yes, failing maintscripts is a terrible headache for users. It often involves editting the maintscript in question, not what I expect just-about-anybody to be able to do
  64 19:19 < Mithrandir> i think being consistent is more important than whether the default is stop or not
  65 19:20 < gwolf> smcv: oh, right - Restart means stop/start. And if the daemon was not running to begin with (due to the user having manually stopped it, say), a restart will leave it running.
  66 19:20 < Mithrandir> gwolf: we expect it to be: fix the daemon startup , dpkg —configure in this case
  67 19:20 < smcv> if the maintscript fails I suppose the key question is what the user is going to do about it
  68 19:20 < Mithrandir> so no maintscript hacking
  69 19:21 < smcv> and whether what they do about it differs from what they'd do if it suppressed the failure and left the package configured-but-non-functional
  70 19:21 < smcv> (with a big fat warning)
  71 19:21 <@marga> One of the problems with failing maintscripts is that it's usually very hard to understand what's failing.  Even if you don't need to edit the maintscript itself, you may need to go read it to understand where it failed.
  72 19:21 <@marga> smcv, where would the warning be visible?
  73 19:21 < gwolf> marga: yes, and that's something that cannot be expected from users in general. Not even knowing where the scripts live
  74 19:21 < Mithrandir> Or just rerun dpkg to get rid of the chaff
  75 19:22 < smcv> marga: insert recurring wish for useful logging here
  76 19:22 <@marga> :)
  77 19:22 < Mithrandir> so, I’m somewhat worried that ignoring failures leads to reboot, then a failed service
  78 19:23 < gwolf> ...I "smell" that this might be too broad of an issue for us to rule
  79 19:23 < smcv> and not ignoring failures leads to what?
  80 19:23 <@marga> Uhm, I'm not sure I follow.  It could also lead to "failed service -> reboot -> working service", depending on the failure
  81 19:23 < smcv> a failed service, a failed dpkg and a confused sysadmin?
  82 19:23 < Mithrandir> in the Debian spirit, can we have a default that is overridable and then we just have to choose the default.
  83 19:24 <@marga> "Maintscripts should try as hard as possible not to fail"?
  84 19:24 < Mithrandir> gwolf: we are asked to advise, not decide.
  85 19:24 < smcv> Mithrandir: only if it doesn't require us to inject yet more shell script complexity into maintscripts
  86 19:24 < gwolf> right
  87 19:24 < gwolf> marga: unset -e
  88 19:24 <@marga> Yeah
  89 19:25 <@marga> "And if they do fail, they should output an actionalbe message to the user"
  90 19:25 < gwolf> ...I would not like that, though...
  91 19:25 < ntyni> it would be nice to have a generic way to 'fail gracefully' and inform the user that a daemon failed to start during configure
  92 19:25 < gwolf> The question is not about the general initscripts flow, but about service restarts
  93 19:26 < ntyni> I'm thinking of something like a dpkg trigger
  94 19:26 < Mithrandir> Can we do something good here without going into detailed design?
  95 19:26 <@marga> gwolf, I'm not sure it's a separate question, really
  96 19:26 < smcv> perhaps relevant:
  97 19:26 < smcv> why is it so important that we restart services?
  98 19:26 < gwolf> marga: it stems/flows from the specific question, right... But we are asked to advise specifically on what happens in service restarts
  99 19:26 < smcv> answer: if they have security vulns then the old version is bad
 100 19:27 < smcv> but if libssl or libldap or libdbus has security vulns then we don't restart the world
 101 19:27 < Mithrandir> smcv: config updates, ensure new version works
 102 19:27 < gwolf> smcv: Also, if there are behavioural changes (i.e. upstream package update), you want the running code to match what you have in disk
 103 19:27 <@marga> The problem is that "the restart operation fails" already has two possible options of failure: stop failed or start failed.
 104 19:28 < Mithrandir> marga: stop failed happens a lot less frequently with systemd though
 105 19:28 <@marga> Yeah, I guess systemd will take care of making it happen one way or the other
 106 19:28 < smcv> service manager in "quite good at stopping services" shock
 107 19:28 < Mithrandir> I need to go again, I’d like to finish this on email, but please do continue the discussion:-)
 108 19:29 < Mithrandir> feel free to highlight me if there is something in particular I can be useful on
 109 19:30 <@marga> So, let's assume that the service did a restart, the stop succeeded but the start failed... What's to gain from the maintscript failing?
 110 19:30 < smcv> so the wider context here is that the submitter of #780403 agrees with marga that things should fail less hard
 111 19:31 < gwolf> marga: right, it's IMO always better to gracefully finish the install than to have a failed maintscript
 112 19:31 < smcv> #780403 is actually about start, not restart, btw
 113 19:31 < gwolf> Even though that will annoy the sysadmin as it results in a dead service
 114 19:32 <@marga> Yeah, it's not that much different anyway
 115 19:32 < smcv> daniel pocock makes an interesting point in the merged bug 802501 (which is about restarts) that if a service is taking down the daemon to do some offline reconfiguration,
 116 19:33 < smcv> it can stop it in the preinst, which (unlike the postinst) can abort installation without leaving the system in an undefined state
 117 19:34 < smcv> I feel as though that's a really rare case though
 118 19:34 <@marga> Yeah, preinst failing is a different story than postinst failing
 119 19:35 < gwolf> and preinst should be quite more limited in scope than postinst fwiw
 120 19:36 <@marga> And really, I personally see no gain in postinst failing.  Is there any gain at all?
 121 19:36 < smcv> stopping in preinst means you have to communicate to the postinst that it's ok to start the thing, though, if you want the overall effect to be like systemctl try-restart (which is "stop, then start if it was previously running")
 122 19:38 < smcv> obligatory controversial opinion: maybe we should be more prepared to require a reboot, and less keen to do surgery while the patient is still awake
 123 19:38 < ntyni> marga: the gain is making sure that the admin notices the daemon failure?
 124 19:38 <@marga> :-/
 125 19:38 <@marga> I guess that shows how bad things are regarding surfacing problems
 126 19:39 < smcv> ntyni: "surprise! your packages are in an undefined state" is not necessarily such a constructive way to signal that?
 127 19:39 < ntyni> it sure isn't
 128 19:39 < gwolf> ntyni: I agree with marga's and smcv's feeling. Of course it grabs the admin's attention. But not for the better!
 129 19:41 < ntyni> I'm not saying it's good practice, just saying that's the only gain I see
 130 19:41 < smcv> sure
 131 19:41 < gwolf> right... But I think we could then unanymously say it's bad to stop in a state the package manager is confused..?
 132 19:42 < gwolf> It's better that the admin notices when foobard is not answering in its usual port...
 133 19:42 < smcv> shouting about it on stderr and in syslog/Journal is always good of course
 134 19:43 <@marga> The more I think about this, the more I think this is a remnant of old times.  We need a better way to communicate to the user that something is not right
 135 19:43 < smcv> systemd seems quite good at making a lot of noise when it can't do what you asked it to
 136 19:43 < ntyni> marga: agreed
 137 19:43 < gwolf> marga: "shouting" is not necessary, but "signaling our init so that it complains" (i.e. "degraded") should be enough
 138 19:44 < smcv> if we assume systemd for a moment
 139 19:44 < smcv> is there a way that daemons *can* fail to restart without it logging that fact?
 140 19:44 <@marga> Not that I'm aware
 141 19:44 < gwolf> Right. I'm talking about my recent experiences :-] But... Logging, stdout, falied return codes upon invocation...
 142 19:44 < smcv> I think it'll always log "systemd[1]: Failed to start whatever."
 143 19:44 < ntyni> maybe we should have an apt hook to run 'systemctl status' after upgrades
 144 19:45 < gwolf> whatever init system you choose, those are the main communication methods
 145 19:45 < ntyni> or something like that
 146 19:45 * gwolf hopes ntyni is joking
 147 19:45 <@marga> It wouldn't help on unattended upgrades, which I expect is the majority of upgrades nowadays
 148 19:45 < ntyni> sure
 149 19:46 < smcv> if your upgrades are unattended then your error reporting also needs to be unattended
 150 19:46 < smcv> logcheck exists
 151 19:46 < smcv> so do nagios and friends
 152 19:46 <@marga> Yup
 153 19:47 < smcv> I'm not sure that fixating on "but what about restarts" is proportionate
 154 19:47 < smcv> daemons can crash any time
 155 19:48 <@marga> This discussion is getting longer than I had originally expected.  It seems to me that we are mostly in agreement that maintscripts failing are generally undesirable, if maybe not in 100% agreement of how undesirable they are... How should we move forward?
 156 19:48 < smcv> it seems like there would be consensus for a statement with some weasel words in it, at least
 157 19:49 * gwolf asks weasel for his words
 158 19:49 < gwolf> marga: I think we have to keep in mind spwhitton's request when he opened this bug...
 159 19:49 < smcv> a service failing to restart should be logged prominently in the system log and the maintainer script's stderr, but should not usually[1] cause the maintainer script to fail, unless there is a really good[2] reason why it must
 160 19:50 <@marga> are 1 and 2 the weasel words?
 161 19:50 < smcv> (this footnote intentionally left blank)
 162 19:50 <@marga> heh
 163 19:50 < gwolf> ...oh - never mind my last sentence - he asks us to _decide_, not to _advise_
 164 19:50 <@marga> Ok, I'll take the action item of writing up something, and send it to the bug.
 165 19:51 < ntyni> he seems to be asking for advice on whether we should decide
 166 19:51 < ntyni> :)
 167 19:51 <@marga> #action Marga to write up a summary of the discussion here and send it to the bug.  Discussion to continue there.
 168 19:51 < gwolf> smcv: I'd go a bit more general than what you suggest - "A service failing to restart should signal the administrator in a prominent but nonintrusive way".?
 169 19:51 < gwolf> (i.e. we don't do design work)
 170 19:51 < smcv> sure, what I have in mind is "what we do now? do that" :-)
 171 19:52 < gwolf> The important part, where we all agree, is that we want dpkg to fail the least possible, and leaving a broken maintscript does not help the user.
 172 19:52 < smcv> yeah
 173 19:52 < ntyni> yes
 174 19:53 < smcv> broken pre*: well if you absolutely must (clue: if you need to ask, you don't)
 175 19:53 < smcv> broken post*: just say no
 176 19:53 < gwolf> :)
 177 19:53 <@marga> :)
 178 19:53 <@marga> Cool, let's move to our third topic
 179 19:53 -!- marga changed the topic of #debian-ctte to: Any interesting things to share from DC18?
 180 19:53 < gwolf> Well, I think it is up to me :)
 181 19:53 <@marga> Yup
 182 19:53 < gwolf> We had our annual ctte bof, which went smoothly but lacked a bit IMO
 183 19:54 < gwolf> (being me the presenter, and presenting slides written by OdyX originally)
 184 19:54 < gwolf> ...There were conversations mainly regarding our two bugs that several of us had, mainly with the submitters, but also with some other interested people
 185 19:54 < gwolf> but all in all, I don't think there's much to report
 186 19:55 < gwolf> ...any questions? :-]
 187 19:55 < ntyni> thank you for handling the bof
 188 19:56 < gwolf> Did any of you follow it?
 189 19:56 < gwolf> Or any IRC conversation that happened during it?
 190 19:56 <@marga> Thanks, I guess that's what I wanted to know.  I haven't actually watched any talks from DebConf
 191 19:57 < ntyni> I did follow it
 192 19:57 < ntyni> the discussion wasn't very lively
 193 19:57 <@marga> :(
 194 19:58 < ntyni> but it was certainly good to have it I think
 195 19:58 < gwolf> nope. Now again, we are at a point where the ctte has been dormant-ish for three months
 196 19:58 < gwolf> I was mostly interested in getting other people to become interested in joining
 197 19:58 < gwolf> but don't think I managed to raise too much enthusiasm
 198 19:58 <@marga> Yeah, that will be a topic for next month's meeting
 199 19:58 <@marga> (we said we would pause recruiting until Sept)
 200 19:59 < gwolf> right
 201 19:59 <@marga> Anyway, we are almost at the hour...
 202 19:59 -!- marga changed the topic of #debian-ctte to: Any additional business?
 203 19:59 < smcv> would it be helpful to have more emphasis on "ask the ctte for advice"?
 204 20:00 < smcv> there are a couple of gnome bugs where my thought is "what would I even do about this"
 205 20:00 < smcv> maybe formally asking the ctte about them would set a good example?
 206 20:01 < gwolf> smcv: could be. We are not drowning in work, as you see ☺ Then again, not every weird issue should be brought to the ctte. Hopefully...
 207 20:02 <@marga> Well, if it's just asking for advice is different than asking for a ruling in a dispute
 208 20:02 < smcv> I'd only consider that for >= important bugs tbh
 209 20:02 <@marga> We've had a bunch of bad experiences last year with disputes going off-track.  We haven't had much advise asking until these two bugs from Sean
 210 20:03 < smcv> yeah that was partly why I thought it might set a good example
 211 20:03 < gwolf> marga: the bad experiences were... modem-manager? and..?
 212 20:03 <@marga> gwolf, uhm... someone else who also orphaned their package rather than engage with the TC...
 213 20:04 < smcv> instead of reserving appeals to the ctte for "I want to overrule this maintainer", try to encourage people to come to the ctte with "I'm considering this workaround for a broken situation but I don't know if I should"
 214 20:04 < gwolf> right. Well, FWIW (and on-the-record, as we are still formally in meeting), I was saddened and surprised to reaed Guillem's answer to me asking for his stance on #904302
 215 20:05 <@marga> Me too and I was going to bring this as a subject to discuss today if we had time, but we are overtime now, so I think I'd rather table it for next month.
 216 20:05 <@marga> smcv, I think it would be a nice experiment
 217 20:06 < ntyni> Guillem has made his opinion about the ctte clear on previous occasions too
 218 20:06 < smcv> ok, will look at summarizing #896019 and/or #888549 into a form I can ask for advice on
 219 20:07 <@marga> Alright, I think that would be endmeeting, unless someone has some urgent matter to bring up?
 220 20:07 * gwolf shuts up
 221 20:07 < ntyni> nothing from me